Introduction
Understanding how regular files and directories are structured internally is essential for comprehending UNIX file system operations. Both are represented by inodes, but their data blocks contain very different types of information.
---
Structure of Regular Files
A regular file's inode points to data blocks that contain the file's actual content (bytes of data).
How Data is Stored:
- Small Files (≤ 10 blocks):
- The inode's 10 direct block pointers point directly to the data blocks.
- No indirection is needed.
- Example: A 5 KB file (with 1 KB blocks) uses 5 direct pointers.
- Medium Files:
- When direct pointers are exhausted, the single indirect pointer is used.
- The indirect block contains pointers to more data blocks.
- Large Files:
- Double indirect and triple indirect pointers are used for very large files.
- This multi-level indexing scheme allows files up to several gigabytes.
Example — Reading Byte at Offset 9000 (Block Size = 1024 bytes):
- Block number = 9000 / 1024 = Block 8 (0-indexed).
- Byte offset within block = 9000 % 1024 = 808.
- Block 8 is within the direct pointer range (0-9), so the kernel reads the 9th direct pointer.
Holes in Files:
- UNIX supports sparse files — files where some blocks are not allocated.
- If a program writes to an offset beyond the current file size, intermediate blocks are not allocated.
- Reading from a hole returns zeros.
- The file appears large, but only allocated blocks consume disk space.
---
Structure of Directories
A directory is a special file whose data blocks contain a list of directory entries. Each entry maps a filename to an inode number.
Directory Entry Format (Traditional UNIX):
| Field | Size | Description |
|---|---|---|
| Inode Number | 2 bytes | Inode number of the file |
| Filename | 14 bytes | Name of the file (padded with null bytes) |
| Total | 16 bytes per entry | — |
- In traditional UNIX (System V), filenames are limited to 14 characters.
- Modern systems (BSD, Linux) support longer filenames with variable-length directory entries.
Special Entries in Every Directory:
| Entry | Inode | Meaning |
|---|---|---|
. | Current dir's inode | Reference to itself |
.. | Parent dir's inode | Reference to parent directory |
Example Directory Contents:
Inode Filename
───── ──────────
100 .
50 ..
101 file1.txt
102 file2.txt
120 subdir
- The directory itself has inode 100.
- Its parent directory has inode 50.
file1.txtis at inode 101,file2.txtat inode 102.subdir(a subdirectory) is at inode 120.
Deleting a File:
- When a file is deleted (
rm file1.txt), the kernel:
- Sets the inode number in the directory entry to 0 (marks it as free).
- Decrements the link count in the file's inode.
- If link count reaches 0 and no process has the file open, the inode and data blocks are freed.
How Path Resolution Works
To access /home/user1/file.txt:
- Start at root inode (inode 2).
- Read root's data blocks → find entry for
home→ get inode number (say 30). - Read inode 30's data blocks → find
user1→ get inode number (say 55). - Read inode 55's data blocks → find
file.txt→ get inode number (say 101). - Read inode 101 → access the file's data blocks.
Summary
- Regular files store data in data blocks pointed to by the inode's block addresses.
- Directories store (inode number, filename) pairs in their data blocks.
- Every directory contains
.and..entries. - Path resolution is a step-by-step process of reading directory entries to find the target inode.
- Sparse files (holes) are supported — unallocated blocks read as zeros.