The Physics of Files: Structs, Inodes & VFS
Why 'Everything is a File' is the most powerful abstraction in history. The physics of the FD Table, VFS function pointers, and the O(1) magic of Epoll.
🎯 What You'll Learn
- Deconstruct the `task_struct` FD Array
- Trace a `read()` call through the VFS (Virtual File System)
- Explain why `select()` is O(N) and `epoll` is O(1)
- Master 'Exotic' FDs: `timerfd`, `signalfd`, `eventfd`
- Debug `too many open files` with `lsof` and `/proc`
📚 Prerequisites
Before this lesson, you should understand:
Introduction
In Windows, a File is a File, a Socket is a Socket, and a Pipe is a Pipe. They have different APIs. In Linux, Everything is a File Descriptor.
This unification is the superpower of Unix. It means you can use the same read() and write() syscalls to talk to a text file, a TCP connection, a timer, or even a hardware signal.
This lesson explores the physics of this abstraction: How an integer 3 translates into a physical disk seek or a network packet.
The Physics: What is an Integer?
When you run int fd = open(...), the kernel returns a number like 3.
This is not a memory address. It is an Index.
Inside the kernel, every process has a task_struct.
Inside that struct is an array: struct file * fd_array[].
- Logic:
fd_array[3]contains a pointer to the actual kernel object (struct file). - Physics: This indirection allows the kernel to swap the “backend” without the application knowing.
The Unified Interface (VFS)
The struct file contains a pointer to struct file_operations. This is a table of function pointers.
struct file_operations {
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
int (*open) (struct inode *, struct file *);
// ...
};
When you call read(3, buf, 10), the kernel does:
- Look up
fd_array[3]. - Follow the pointer to
struct file. - Execute
file->f_op->read(...).
If FD 3 is a Disk File, it calls ext4_file_read_iter.
If FD 3 is a Socket, it calls tcp_recvmsg.
The application does not care.
The Evolution of Waiting: Select vs Epoll
Servers need to handle thousands of concurrent connections (FDs). How do you know which one of 10,000 sockets has data ready?
The Old Way: Select (O(N))
You give the kernel a list of 10,000 FDs. The kernel scans linearally through all 10,000 to check their status.
- Cost: O(N). Slow. Burns CPU just to find work.
The New Way: Epoll (O(1))
epoll is an event-driven mechanism.
- Red-Black Tree: Stores the set of monitored FDs (Log(N) insertion).
- Ready List: A doubly-linked list of FDs that actually have events.
When a packet arrives, the hardware interrupt fires, wakes the kernel, and the kernel adds that specific FD to the Ready List.
When you call epoll_wait(), it doesn’t scan. It just returns the Ready List.
- Cost: O(1). Constant time, regardless of connection count.
Exotic File Descriptors
Linux takes the abstraction to the extreme.
1. eventfd
A counter stored in the kernel.
- Thread A writes
1. - Thread B reads.
- Use case: High-performance inter-thread notification without locking overhead.
2. timerfd
A timer that alerts you via a file descriptor.
- Use case: You can add this FD to your
epollloop. Now your network loop handles network events AND timer events in the same thread. No separate “Timer Thread” needed.
3. signalfd
Receive Unix Signals (SIGINT, SIGTERM) via a file descriptor.
- Use case: Handle
Ctrl+Csafely in your main event loop instead of dealing with complex async signal handlers.
Code: Debugging FDs
When your server crashes with Too many open files, you need to dissect the process.
# 1. Who has the most FDs open?
lsof | awk '{print $2}' | sort | uniq -c | sort -nr | head -5
# 2. Inspect a specific process (PID 1234)
# Shows the mapping: FD -> Type -> Device
ls -l /proc/1234/fd/
# Output:
# 0 -> /dev/pts/0 (Terminal)
# 1 -> /dev/pts/0
# 3 -> socket:[23412] (TCP Connection)
# 4 -> /var/log/nginx/access.log (Disk File)
# 5 -> anon_inode:[eventpoll] (The epoll instance itself!)
Practice Exercises
Exercise 1: The Redirect (Beginner)
Task: ls > out.txt.
Physics: The shell calls open("out.txt") -> returns FD 3. It then calls dup2(3, 1). Now FD 1 (STDOUT) points to the file struct for out.txt. ls writes to FD 1, unknowing it is a file.
Exercise 2: The Socket (Intermediate)
Task: Run nc -l 8080. In another tab, find its PID and check /proc/PID/fd.
Observation: You will see a socket:[...] link.
Exercise 3: The Limit (Advanced)
Task: Check ulimit -n.
Challenge: Write a Python script that opens files in a loop until it crashes.
Fix: Modify /etc/security/limits.conf to raise the limit to 65535.
Knowledge Check
- Is a File Descriptor a pointer?
- What data structure does
select()use to check for events? - Why is
epollfaster thanselectfor high concurrency? - What actually happens when you write to a
signalfd? - What does
eventfdstore?
Answers
- No. It is an integer index into the process’s
fd_array. - A Bitmap. It scans bits linearly.
- O(1) vs O(N). Epoll uses a Ready List populated by callbacks, no scanning needed.
- Nothing. You read from it to receive signals. The kernel writes to it when a signal is delivered.
- A 64-bit integer counter. Used for lightweight notification.
Summary
- FD: An Index.
- VFS: The Interface.
- Epoll: The Scaler.
- Everything: Is a File.
Questions about this lesson? Working on related infrastructure?
Let's discuss