A Brief Introduction to ext2

DONG Yuxuan

Basic Concepts

The disk is divided into blocks by the file system. The size of a block is specified while initialization, usually 1KB or 4KB. This article assumes it’s 1KB unless otherwise stated.

A file’s data is stored in blocks and a block is the smallest storage unit. If the size of the file is 2.5KB, we need 3 blocks to store it. Blocks of a file’s data may not be continuous. Blocks storing file data are called data blocks. Not all blocks are data blocks. Some blocks are used to store metadata like the inode table which will be introduced later.

Apparently, we need to know the metadata about which blocks are free, which are used. This metadata is stored as a bitmap called the block bitmap. When we write data to a file, we allocate free blocks as the data blocks of the file, then flag bits of these blocks as used in the block bitmap, then write data to blocks.

As blocks of a file may not be contiuous, how do we know which blocks belong to the file? Of course for each file we need maintain a data block list. The list is one of the file’s metadata. The data block list with other metadata like permissions together are stored in the structure called inode. The filename is not stroed in the data nor in the inode. It’s stored in the data of the directory containing the file. We will discuss this later.

As blocks have the block bitmap, inodes have the inode bitmap to indicate which is used, which is free.

As the inode doesn’t contain the filename, two files can have the same inode and this is how hard links work. If you create a hard link of the file. The link and the file itself will have one inode, so they have the same blocks, so they synchronize data and metadata like permissions with each other. Each inode maintains a reference count to record how many files are linked to it.

The inode is the entrance of a file. Most operations of the file need finding its inode as the first step. However, as an application programmer, we rarely face inodes directly. In the application level, the first step of visiting a file is alway opening it according to its filepath, or filename.

How does the file system find the inode by the filename? To understand it we must first understand what a directory really is.

A directory is also a file, its data is names and points to inodes of files and subdirectories in it. Each item (filename with inode pointer) is called a directory entry. You can run vi /usr to see it more clearly.

The root directory has a fixed inode number, usually 2. If we want to visit /usr/local/etc/kvdiff.txt, we find the inode of /, as I said it’s fixed, and find data blocks of /, then search the directory entries of / for usr to find the inode of /usr. Repeat this procedure until we find the inode of /usr/local/etc/kvdiff.txt. This procedure is so important that I must give it a name “the locating algorithm” for later references in the article.

Basic Operations

Now we have understood the baisc theory of file systems. We can use it to explain basic operations of files.

Application: Image Serve

Suppose you’re going to develop a web service to store users’ avatars. How should you arrange the directory structure? The most simple solution is putting all image files into one directory and naming files by their md5 values.

If you arrange so, how long will your system take to visit a file? The answer is \(\Theta(n)\), where \(n\) is the number of files, because to read or write a file, we must run the locating algorithm. It means to scan the directory entry list with length \(n\). If you have large number of files, it would be slow.

A smarter solution is to build a 32-levels structure. Each level represents a character in the 32-length md5 hex string. For example, the PNG file with md5 aaca0f5eb4d2d98a6ce6dffa99f8254b will be put in a/a/c/a/0/f/5/e/b/4/d/2/d/9/8/a/6/c/e/6/d/f/f/a/9/9/f/8/2/5/4/b.png.

In this structure, visiting a file will run 32 times of locating algorithm and each will scan at most 16 directory entries. So we do at most 16*32=512 times of filename comparisons.