The Physics of Caching: Locality, Coherence & Thundering Herds
Why accessing RAM is 100x slower than L1. The physics of Temporal vs Spatial Locality, Cache Coherence problems, and the mathematics of the Thundering Herd.
🎯 What You'll Learn
- Quantify Latency: L1 vs L2 vs RAM vs Network
- Master Locality of Reference (Temporal & Spatial)
- Implement Cache-Aside vs Write-Through Patterns
- Solve the Thundering Herd Problem (Probabilistic Locking)
- Tune Redis Eviction Policies (LRU vs LFU Physics)
📚 Prerequisites
Before this lesson, you should understand:
Introduction
In physics, the speed of light is the limit. In computers, the speed of light is too slow. Light travels 30cm in 1 nanosecond. A CPU cycle is 0.2 nanoseconds. If data is not physically close to the CPU, the CPU stalls.
Caching is not just “saving data.” It is Hierarchical Storage Optimization based on the probability of access.
The Latency Hierarchy
Every engineer should memorize these orders of magnitude:
| Storage Layer | Latency | Physics Metaphor |
|---|---|---|
| L1 Cache | 1 ns | Picking a book from your desk. |
| L2 Cache | 4 ns | Picking a book from the bookshelf. |
| RAM | 100 ns | Walking to the library. |
| SSD (NVMe) | 100,000 ns | Flying to New York. |
| Network (Redis) | 500,000 ns | Flying to the Moon. |
Key Insight: Redis is fast, but it is 5,000x slower than L1 Cache. Do not overuse it for extremely hot loops.
The Physics: Locality of Reference
Caching works because code is predictable.
- Temporal Locality: If you used a variable recently, you will likely use it again soon.
- Implementation: Least Recently Used (LRU) Eviction.
- Spatial Locality: If you accessed address , you will likely access .
- Implementation: Cache Lines (fetching 64 bytes at a time).
Anti-Pattern: Linked Lists. Nodes are scattered in memory (random pointers). This destroys Spatial Locality. Pro-Pattern: Arrays. Contiguous memory blocks maximize Spatial Locality.
The Thundering Herd Problem
Scenario:
You have a hot cache key price:BTC with TTL 1 second.
At , the key expires.
10,000 requests arrive simultaneously.
All 10,000 see a MISS.
All 10,000 hit the Database to calculate the price.
Result: Database CPU goes to 100%. Service dies.
Physics Solution: Probabilistic Early Expiration (X-Fetch) Instead of a hard TTL, we recompute before expiration based on a probability . If , recompute early. Only one request triggers the recompute, preventing the herd.
Cache Consistency: The Hard Problem
“There are only two hard things in Computer Science: cache invalidation and naming things.”
Write-Through: App writes to Cache AND DB.
- Pros: Read consistency.
- Cons: Write latency doubles.
Cache-Aside (Lazy Loading): App reads Cache. Miss? Read DB -> Write Cache.
- Pros: Fast writes.
- Cons: Stale hits are possible if DB is updated via backchannel.
Physics Solution: Change Data Capture (CDC). Trigger cache invalidation by listening to the Database Writelog (Postgres WAL) directly.
Practice Exercises
Exercise 1: RAM vs Redis (Beginner)
Task: Benchmark a Python dictionary vs Redis GET.
Action: Measure 10,000 reads.
Result: Dictionary (L1/RAM) will be nanoseconds. Redis will be milliseconds.
Exercise 2: Simulating Stampede (Intermediate)
Task: Create a script that spawns 100 threads requesting a key. Action: Delete the key. Watch them all execute the “DB Fetch” function. Fix: Implement a Semaphore/Lock on the “DB Fetch”.
Exercise 3: Array vs Linked List (Advanced)
Task: Iterate over a 10MB Array vs a 10MB Linked List. Action: Measure time. Why: The Array fits in Cache Lines. The List causes constant Cache Misses.
Knowledge Check
- Why is a Linked List bad for Caching?
- What is the “Thundering Herd”?
- If Redis is 0.5ms, is it “fast”?
- What is Spatial Locality?
- How does CDC improve Cache Consistency?
Answers
- Poor Spatial Locality. Nodes are scattered, preventing Cache Line optimizations.
- Simultaneous Misses. Thousands of requests hitting the backing store at once.
- Depends. Fast for network, slow for CPU. 0.5ms = 500,000 cycles.
- Neighborhood Access. Accessing memory locations near each other.
- Real-time Invalidation. It reacts to the source of truth (DB Log).
Summary
- L1 Cache: The only thing that is truly fast.
- Locality: Arrays > Linked Lists.
- Thundering Herd: Solve with Math (Probabilistic Recompute) or Locks.
- Consistency: The hardest problem.
Questions about this lesson? Working on related infrastructure?
Let's discuss