Infrastructure
CPU Isolation for HFT: The isolcpus Lie and What Actually Works
Why the standard 'isolcpus' kernel parameter doesn't fully isolate your critical threads, and the combination of settings required for true deterministic scheduling.
At Akuna, a quant’s P99 latency chart showed perfect 15µs performance-except for a 200µs spike every 4ms.
The culprit wasn’t the application. It was the kernel’s CONFIG_HZ = 250 timer interrupt, scheduled on the “isolated” CPU. The isolcpus parameter had done nothing meaningful.
This post documents the three-part combination required for true CPU isolation: isolcpus, nohz_full, and rcu_nocbs.
1. The Physics of Kernel Interrupts
The Linux kernel, by default, is a polite roommate. It does three things on every CPU, every few milliseconds, to keep the house clean:
- The Timer Tick (APIC): Every seconds (usually 4ms), the hardware Local APIC fires an interrupt. The kernel wakes up, updates system time (
jiffies), and checks if the current process has run too long. - RCU Callbacks (Garbage Collection): Linux uses Read-Copy-Update for lock-free data structures. When a writer updates data, the old version isn’t deleted until all readers are done. This “GC” happens on every CPU.
- Scheduler Load Balancing: The kernel looks for under-utilized CPUs and migrates tasks to them.
isolcpus=4-7 only disables #3 (load balancing). The Timer Tick and RCU callbacks still fire, causing Instruction Cache (L1i) Pollution and Context Switch Overhead.
2. The Decision Matrix
| Approach | Timer Interrupts (APIC) | RCU Callbacks | Scheduler | Verdict |
|---|---|---|---|---|
| A. isolcpus only | Yes (Bad) | Yes (Bad) | No | The Rookie Mistake. |
| B. isolcpus + nohz_full | No (Mostly) | Yes (Bad) | No | Better, but RCU still creates jitter. |
| C. Full Isolation | No | No | No | Selected. True deterministic execution. |
Why This Matters: A 250Hz timer interrupt (every 4ms) introduces a 1-5µs jitter spike. If your trade loop takes 10µs, you have a probability of being interrupted mid-trade. In HFT, 0.25% is too high.
3. The Kill: Full Isolation Configuration
You need to tell the kernel: “These cores are not yours anymore.”
Step 1: Update GRUB
We assume an 8-core system where cores 4-7 are dedicated to trading.
# /etc/default/grub
GRUB_CMDLINE_LINUX="isolcpus=4-7 nohz_full=4-7 rcu_nocbs=4-7"
isolcpus=4-7: Tells the scheduler “Don’t put random processes here.”nohz_full=4-7: Tells the timer subsystem “If there is only 1 task running, don’t fire the tick.” (Adaptive Ticks).rcu_nocbs=4-7: Tells the RCU subsystem “Don’t run callbacks here. Offload them to cores 0-3.”
Step 2: Regenerate GRUB & Reboot
sudo update-grub && sudo reboot
Step 3: The “One Process” Rule
nohz_full only works if exactly one task is running on the core. If you start a second thread, the kernel must re-enable the timer tick to multitask between them.
# Pin your app explicitly
taskset -c 4 ./my_trading_engine
4. The Tool: Auditing Isolation State
How do you know it worked? watch the interrupts.
# Watch Local Timer Interrupts (LOC) on Cores 4-7
watch -n 1 'cat /proc/interrupts | grep "LOC:" | awk "{print $1, $6, $7, $8, $9}"'
- Before: The numbers for CPU4-7 increment by ~250 every second.
- After: The numbers should stay frozen. (You might see 1 tick per second for statistics updates, which is unavoidable on some kernels).
5. Systems Thinking: The Trade-offs
- Reduced Core Count: 4 isolated cores means 4 fewer cores for your OS, logging agents, and SSH sessions. If you overload the “housekeeping” cores (0-3), the whole system becomes sluggish.
- Debugging Blindness:
topand standard profilers assume the timer tick is running. When you disable it, CPU usage stats for that core might report 100% or 0% incorrectly. Useperf record -C 4for truth. - IRQ Balance: Ensure hardware interrupts (NIC, NVMe) are not routed to isolated cores. Stop the
irqbalanceservice and manually pin IRQs to housekeeping cores.
6. The Philosophy
isolcpusis a half-measure. The kernel’s default behavior assumes you want fairness. For HFT, you want unfairness. You want a dictator core.
True isolation is not a single flag. It is a contract with the kernel: “I will manage this CPU. You will not touch it.” Achieving this requires disabling three subsystems, not one.
Most engineers stop at isolcpus and wonder why their P99 spikes. You now know why.
Audit Your Infrastructure
Want to check if your servers are configured for low latency? Run latency-audit - it checks CPU governors, C-states, NUMA, and 30+ other settings in seconds.
pip install latency-audit && latency-audit Up Next in Linux Infrastructure Deep Dives
Memory Tuning for Low-Latency: The THP Trap and HugePage Mastery
How Transparent Huge Pages cause unpredictable latency spikes, and the explicit HugePage reservation strategy that eliminates memory stalls.
Reading Path
Continue exploring with these related deep dives:
| Topic | Next Post |
|---|---|
| THP, huge pages, memory locking, pre-allocation | Memory Tuning for Low-Latency: The THP Trap and HugePage Mastery |
| NIC offloads, IRQ affinity, kernel bypass | Network Optimization: Kernel Bypass and the Art of Busy Polling |
| The 5 kernel settings that cost you latency | The $2M Millisecond: Linux Defaults That Cost You Money |
| Measuring without overhead using eBPF | eBPF Profiling: Nanoseconds Without Adding Any |
| Design philosophy & architecture decisions | Trading Infrastructure: First Principles That Scale |