The Physics of Data: B-Trees, LSM-Trees & WAL
Databases are not magic boxes. The integrity physics of Write Ahead Logs (WAL), Read vs Write optimization (B-Tree vs LSM), and ACID atomicity.
🎯 What You'll Learn
- Compare Storage Engines: B-Tree (Read Heavy) vs LSM Tree (Write Heavy)
- Master WAL Physics (Durability via Append-Only Logs)
- Implement Atomicity (Commit vs Rollback Physics)
- Tune Buffer Pools (Dirty Pages & Checkpoints)
- Visualize Database Compaction (LSM Merge)
📚 Prerequisites
Before this lesson, you should understand:
Introduction
A Database is a file system that lies to you. It tells you “Data is Saved” before it hits the disk platter. How? The Write Ahead Log (WAL).
This lesson deconstructs the physics of Persistence, Indexing, and Atomicity.
Part 1: Storage Physics (B-Tree vs LSM)
There are only two fundamental ways to store data on disk.
1. The B-Tree (Read Optimized)
- Used by: PostgreSQL, MySQL (InnoDB), Oracle.
- Physics: Data is stored in fixed-size pages (e.g., 8KB). A sorted Tree structure points to these pages.
- Read: . extremely fast. You traverse pointers down the tree.
- Write: Random I/O. To change one row, you must load the 8KB page, modify it, and write it back. This is slow on HDDs and amplifies writes on SSDs.
2. The LSM Tree (Write Optimized)
- Used by: Cassandra, RocksDB, LevelDB.
- Physics: Log-Structured Merge Tree.
- Writes go to memory (MemTable).
- When memory is full, valid data is FLUSHED to an immutable file (SSTable) on disk.
- Writes are strictly Sequential.
- Write: Extremely Fast (Append-only).
- Read: Slower. You may have to check multiple SSTables to find the latest version.
Decision Physics:
- Read:Write Ratio > 10:1 B-Tree.
- Write:Read Ratio > 1:1 LSM Tree.
Part 2: Durability Physics (The WAL)
How does a database promise data is safe if it’s still in RAM? The Write Ahead Log.
- Append the mutation to the WAL file on disk (
O_DIRECT,fsync). - Update the data in Memory (Buffer Pool).
- Acknowledge “Success” to the user.
If the power dies 1ms later:
- Memory is lost.
- On reboot, the DB reads the WAL and “replays” the changes to restore the state.
- Physics: Sequential formatting is 100x faster than random updating. We cheat death by writing sequentially first.
Part 3: Atomicity Physics (ACID)
Atomicity: All or Nothing. How do you update 2 rows at once?
BEGIN;
UPDATE Account A SET balance = balance - 100;
-- Power Failure Here
UPDATE Account B SET balance = balance + 100;
COMMIT;
Implementation:
The DB writes a BEGIN marker to the WAL.
It writes the first update.
It does NOT write COMMIT yet.
On recovery, the DB sees a BEGIN without a COMMIT.
It acts as if the transaction never happened. It effectively rewinds history.
Practice Exercises
Exercise 1: Storage Engine Duel (Beginner)
Task: Insert 1 Million rows into SQLite (B-Tree) vs RocksDB (LSM). Observation: RocksDB inserts will be significantly faster due to sequential writes.
Exercise 2: Kill -9 (Intermediate)
Task: Start a transaction. Insert a record. Kill the process (kill -9) before commit.
Action: Restart DB. Select the record.
Result: It is gone. The WAL replay ignored the uncommitted transaction.
Exercise 3: Dirty Pages (Advanced)
Task: Configure Postgres/MySQL checkpoint_timeout.
Action: Run a heavy write load.
Observation: Watch I/O spikes when the Checkpoint process forces Dirty Pages from RAM to Disk.
Knowledge Check
- Which structure is better for Analytics (Scanning)?
- Why is WAL faster than updating the B-Tree directly?
- What happens to pages in the Buffer Pool that are modified?
- Does an LSM Tree modify files in place?
- ACID: What is “Isolation”?
Answers
- LSM / Columnar. Sequential packing allows efficient compression and scanning.
- Sequential I/O. Appending to a log is faster than Random I/O updates.
- Marked Dirty. They are flushed to disk later by a Checkpoint.
- No. It writes new immutable files (SSTables) and merges them later.
- Visibility. Ensuring one transaction cannot see the intermediate state of another.
Summary
- B-Tree: Read Fast, Write Slow.
- LSM: Write Fast, Read Slow.
- WAL: The cheat code for Durability.
- ACID: The promise of Sanity in a chaotic universe.
Questions about this lesson? Working on related infrastructure?
Let's discuss