The Physics of FPGA: Hardware Acceleration
Why Software is too slow. The physics of Tick-to-Trade, Logic Gates, and Pipeline Determinism.
🎯 What You'll Learn
- Deconstruct Tick-to-Trade Latency (Wire-to-Wire)
- Analyze the von Neumann Bottleneck (why CPUs are slow)
- Trace a packet through a Pipelined FPGA Architecture
- Calculate the throughput of a 300MHz FPGA core
- Audit a Verilog State Machine for Order Processing
📚 Prerequisites
Before this lesson, you should understand:
Introduction
In the Nanosecond Economy, the CPU is the bottleneck. A CPU is a “Juggler”: It handles thousands of tasks (OS, Network, Logic) by switching between them very fast. An FPGA is an “Assembly Line”: It does one thing, perfectly, in parallel, with zero interruptions.
This lesson explores why we burn custom silicon to save 700 nanoseconds.
The Physics: Tick-to-Trade (T2T)
The Metric: Time from “First Bit of Market Data In” to “First Bit of Order Out”.
- Software (C++): ~2-5 microseconds.
- FPGA (Hardware): ~40-100 nanoseconds.
Physics: Software pays the “von Neumann Tax”:
- Interrupt fires.
- Context Switch to Kernel Mode.
- Copy Packet to RAM.
- Context Switch to User Mode.
- CPU reads RAM into Cache (Cache Miss?).
- CPU executes instruction.
FPGA pays Zero Tax. The electrons flow through the logic gates like water through a pipe.
Deep Dive: Pipelining (The Assembly Line)
How does an FPGA process 100 Million Insert messages per second? Pipelining.
The Physics: Imagine a packet takes 100 clock cycles to process.
- CPU: Must finish Packet A before starting Packet B. Throughput = 1/100.
- FPGA: Splits the task into 100 stages.
- Cycle 1: Stage 1 processes Packet A.
- Cycle 2: Stage 2 processes Packet A. Stage 1 processes Packet B.
- Result: Throughput = 1 packet per cycle. Massive Parallelism.
Architecture: Hybrid Systems (Solarflare)
Most firms don’t just use FPGA. They use SmartNICs (e.g., Solarflare X3522). The FPGA sits on the Network Card.
- Filtering: FPGA drops 99% of “Noise” packets.
- Forwarding: Sends only “Signal” packets to the CPU over PCIe.
- Result: CPU load drops; crucial latency improves.
Code: Verifying an Order (Verilog)
In Software, if (price > limit) is compiled to assembly.
In Hardware, if (price > limit) allows electrons to flow to the “Buy” wire.
module OrderTrigger (
input wire clk,
input wire [31:0] market_price,
input wire [31:0] limit_price,
output reg buy_signal
);
always @(posedge clk) begin
// The comparison happens physically in 1 clock cycle (3ns)
if (market_price < limit_price) begin
buy_signal <= 1'b1;
end else begin
buy_signal <= 1'b0;
end
end
endmodule
Practice Exercises
Exercise 1: The Jitter (Beginner)
Scenario: Measure latency of 1000 orders. Software: Min 5us, Max 50us (OS Jitter). FPGA: Min 80ns, Max 82ns (Deterministic). Lesson: FPGA wins on Consistency (Standard Deviation).
Exercise 2: Throughput Math (Intermediate)
Scenario: FPGA Clock = 300 MHz (3.3ns per cycle). Pipeline: Can accept 1 packet every cycle. ** Throughput:** . Bandwidth: If packet is 64 bytes: . (Line Rate 25Gbps is saturated).
Exercise 3: Development Cost (Advanced)
Scenario: A bug is found in algo logic. Software: Fix + Compile = 30 seconds. FPGA: Fix + Synthesis + Place & Route = 4 to 12 hours. Tradeoff: FPGAs are inflexible. Only use them for logic that rarely changes (e.g., Feed Parsing, Risk Checks).
Knowledge Check
- What is the “von Neumann Tax”?
- Why is FPGA latency “Deterministic”?
- What is Pipelining?
- Why is FPGA development slower than C++?
- What is a SmartNIC?
Answers
- Memory Access. Fetching instructions and data from RAM dominates execution time.
- No OS. No scheduling, no interrupts. Every clock cycle does exactly the same work.
- Parallel Stages. Processing different parts of multiple packets simultaneously.
- Synthesis. Compiling code into physical circuit layout is mathematically complex (NP-Hard).
- FPGA + NIC. A network card with a programmable chip for offloading tasks.
Summary
- Software: Juggling.
- FPGA: Assembly Line.
- Latency: Nanoseconds.
Questions about this lesson? Working on related infrastructure?
Let's discuss