Infrastructure

CPU Isolation for HFT: The isolcpus Lie and What Actually Works

Why the standard 'isolcpus' kernel parameter doesn't fully isolate your critical threads, and the combination of settings required for true deterministic scheduling.

4 min
#cpu #isolcpus #nohz_full #rcu #kernel #latency

At Akuna, a quant’s P99 latency chart showed perfect 15µs performance-except for a 200µs spike every 4ms.

The culprit wasn’t the application. It was the kernel’s CONFIG_HZ = 250 timer interrupt, scheduled on the “isolated” CPU. The isolcpus parameter had done nothing meaningful.

This post documents the three-part combination required for true CPU isolation: isolcpus, nohz_full, and rcu_nocbs.

1. The Physics of Kernel Interrupts

The Linux kernel, by default, is a polite roommate. It does three things on every CPU, every few milliseconds, to keep the house clean:

  1. The Timer Tick (APIC): Every 1/CONFIG_HZ1/CONFIG\_HZ seconds (usually 4ms), the hardware Local APIC fires an interrupt. The kernel wakes up, updates system time (jiffies), and checks if the current process has run too long.
  2. RCU Callbacks (Garbage Collection): Linux uses Read-Copy-Update for lock-free data structures. When a writer updates data, the old version isn’t deleted until all readers are done. This “GC” happens on every CPU.
  3. Scheduler Load Balancing: The kernel looks for under-utilized CPUs and migrates tasks to them.

isolcpus=4-7 only disables #3 (load balancing). The Timer Tick and RCU callbacks still fire, causing Instruction Cache (L1i) Pollution and Context Switch Overhead.

App on CPU 4 Timer Tick (4ms) Context Switch to Ring 0 (1-5µs) L1 Cache Flushed App Resumes

2. The Decision Matrix

ApproachTimer Interrupts (APIC)RCU CallbacksSchedulerVerdict
A. isolcpus onlyYes (Bad)Yes (Bad)NoThe Rookie Mistake.
B. isolcpus + nohz_fullNo (Mostly)Yes (Bad)NoBetter, but RCU still creates jitter.
C. Full IsolationNoNoNoSelected. True deterministic execution.

Why This Matters: A 250Hz timer interrupt (every 4ms) introduces a 1-5µs jitter spike. If your trade loop takes 10µs, you have a 0.25%\approx 0.25\% probability of being interrupted mid-trade. In HFT, 0.25% is too high.

3. The Kill: Full Isolation Configuration

You need to tell the kernel: “These cores are not yours anymore.”

Step 1: Update GRUB

We assume an 8-core system where cores 4-7 are dedicated to trading.

# /etc/default/grub
GRUB_CMDLINE_LINUX="isolcpus=4-7 nohz_full=4-7 rcu_nocbs=4-7"
  • isolcpus=4-7: Tells the scheduler “Don’t put random processes here.”
  • nohz_full=4-7: Tells the timer subsystem “If there is only 1 task running, don’t fire the tick.” (Adaptive Ticks).
  • rcu_nocbs=4-7: Tells the RCU subsystem “Don’t run callbacks here. Offload them to cores 0-3.”

Step 2: Regenerate GRUB & Reboot

sudo update-grub && sudo reboot

Step 3: The “One Process” Rule

nohz_full only works if exactly one task is running on the core. If you start a second thread, the kernel must re-enable the timer tick to multitask between them.

# Pin your app explicitly
taskset -c 4 ./my_trading_engine

4. The Tool: Auditing Isolation State

How do you know it worked? watch the interrupts.

# Watch Local Timer Interrupts (LOC) on Cores 4-7
watch -n 1 'cat /proc/interrupts | grep "LOC:" | awk "{print $1, $6, $7, $8, $9}"'
  • Before: The numbers for CPU4-7 increment by ~250 every second.
  • After: The numbers should stay frozen. (You might see 1 tick per second for statistics updates, which is unavoidable on some kernels).

5. Systems Thinking: The Trade-offs

  1. Reduced Core Count: 4 isolated cores means 4 fewer cores for your OS, logging agents, and SSH sessions. If you overload the “housekeeping” cores (0-3), the whole system becomes sluggish.
  2. Debugging Blindness: top and standard profilers assume the timer tick is running. When you disable it, CPU usage stats for that core might report 100% or 0% incorrectly. Use perf record -C 4 for truth.
  3. IRQ Balance: Ensure hardware interrupts (NIC, NVMe) are not routed to isolated cores. Stop the irqbalance service and manually pin IRQs to housekeeping cores.

6. The Philosophy

isolcpus is a half-measure. The kernel’s default behavior assumes you want fairness. For HFT, you want unfairness. You want a dictator core.

True isolation is not a single flag. It is a contract with the kernel: “I will manage this CPU. You will not touch it.” Achieving this requires disabling three subsystems, not one.

Most engineers stop at isolcpus and wonder why their P99 spikes. You now know why.


Audit Your Infrastructure

Want to check if your servers are configured for low latency? Run latency-audit - it checks CPU governors, C-states, NUMA, and 30+ other settings in seconds.

pip install latency-audit && latency-audit

Reading Path

Continue exploring with these related deep dives:

TopicNext Post
THP, huge pages, memory locking, pre-allocationMemory Tuning for Low-Latency: The THP Trap and HugePage Mastery
NIC offloads, IRQ affinity, kernel bypassNetwork Optimization: Kernel Bypass and the Art of Busy Polling
The 5 kernel settings that cost you latencyThe $2M Millisecond: Linux Defaults That Cost You Money
Measuring without overhead using eBPFeBPF Profiling: Nanoseconds Without Adding Any
Design philosophy & architecture decisionsTrading Infrastructure: First Principles That Scale
Share: LinkedIn X