Order Book Reconstruction in Sub-Millisecond

How to build and maintain order books from L2/L3 feeds. Sequence numbers, gap detection, and snapshot recovery.

Intermediate 25 min read Expert Version →

🎯 What You'll Learn

  • Understand L1/L2/L3 market data feeds
  • Build an order book from incremental updates
  • Handle sequence gaps and snapshot recovery
  • Optimize for sub-millisecond reconstruction

📚 Prerequisites

Before this lesson, you should understand:

Why Sub-Millisecond Matters

Your trading strategy sees a price. By the time you react, the price has moved.

Order book age: 1ms   → You might win
Order book age: 10ms  → You're trading on stale data
Order book age: 100ms → You're the sucker at the table

This lesson teaches you how to build and maintain real-time order books from exchange data feeds.


What You’ll Learn

By the end of this lesson, you’ll understand:

  1. L1/L2/L3 data formats - Trade-offs between detail and bandwidth
  2. Incremental updates - Building books from deltas
  3. Sequence handling - Detecting and recovering from gaps
  4. Performance optimization - Memory layout and data structures

The Foundation: Market Data Levels

LevelContainsUpdates/secUse Case
L1Best bid/ask only10-100Simple trading
L2Top N levels (e.g., top 10)100-1,000Most strategies
L3Every order (individual IDs)1,000-100,000Market making

Most exchanges provide L2. True L3 is rare (mainly equity markets).


The “Aha!” Moment

Here’s what makes or breaks order book systems:

Order books are event-sourced. You don’t receive the current state-you receive a stream of changes. Miss one update, and your book is wrong forever until you re-sync.

Every update has a sequence number. Gap detection is mandatory.


Let’s See It In Action: Order Book Data Structure

from collections import defaultdict
from sortedcontainers import SortedDict

class OrderBook:
    def __init__(self, symbol: str):
        self.symbol = symbol
        self.bids = SortedDict()  # price -> quantity (descending)
        self.asks = SortedDict()  # price -> quantity (ascending)
        self.sequence = 0
        
    def apply_snapshot(self, snapshot: dict):
        """Initialize from full snapshot."""
        self.bids.clear()
        self.asks.clear()
        for price, qty in snapshot['bids']:
            self.bids[-float(price)] = float(qty)  # Negative for descending
        for price, qty in snapshot['asks']:
            self.asks[float(price)] = float(qty)
        self.sequence = snapshot['sequence']
    
    def apply_update(self, update: dict) -> bool:
        """Apply incremental update. Returns False if gap detected."""
        expected_seq = self.sequence + 1
        if update['sequence'] != expected_seq:
            return False  # GAP! Need to re-sync
        
        for price, qty in update.get('bids', []):
            price = -float(price)
            if float(qty) == 0:
                self.bids.pop(price, None)
            else:
                self.bids[price] = float(qty)
        
        for price, qty in update.get('asks', []):
            price = float(price)
            if float(qty) == 0:
                self.asks.pop(price, None)
            else:
                self.asks[price] = float(qty)
        
        self.sequence = update['sequence']
        return True
    
    def best_bid(self):
        return -self.bids.peekitem(0)[0] if self.bids else None
    
    def best_ask(self):
        return self.asks.peekitem(0)[0] if self.asks else None
    
    def mid_price(self):
        bid, ask = self.best_bid(), self.best_ask()
        return (bid + ask) / 2 if bid and ask else None

Sequence Gap Detection

class SafeOrderBook(OrderBook):
    def __init__(self, symbol: str, on_gap_callback):
        super().__init__(symbol)
        self.on_gap = on_gap_callback
        self.buffered_updates = []
        self.max_buffer = 1000
    
    def process_message(self, msg: dict):
        if msg['type'] == 'snapshot':
            self.apply_snapshot(msg)
            self._replay_buffered()
        elif msg['type'] == 'update':
            if not self.apply_update(msg):
                # Gap detected!
                self._handle_gap(msg)
    
    def _handle_gap(self, msg):
        """Buffer updates and request new snapshot."""
        self.buffered_updates.append(msg)
        if len(self.buffered_updates) > self.max_buffer:
            self.buffered_updates.pop(0)
        self.on_gap(self.symbol)  # Trigger snapshot request
    
    def _replay_buffered(self):
        """Apply buffered updates after snapshot."""
        valid_updates = [u for u in self.buffered_updates 
                        if u['sequence'] > self.sequence]
        valid_updates.sort(key=lambda x: x['sequence'])
        
        for update in valid_updates:
            if not self.apply_update(update):
                break  # Still have a gap
        
        self.buffered_updates.clear()

Performance Optimization

1. Memory Layout

# Bad: Python dicts with string keys
book = {"bids": {}, "asks": {}}  # Slow, memory-intensive

# Better: NumPy arrays for numerical data
import numpy as np
bids = np.zeros((1000, 2), dtype=np.float64)  # [price, qty] pairs
asks = np.zeros((1000, 2), dtype=np.float64)

# Best: Pre-allocated fixed-size arrays with manual management

2. Update Parsing

# Bad: JSON parsing every message
data = json.loads(raw_message)

# Better: Use orjson (3-10x faster)
import orjson
data = orjson.loads(raw_message)

# Best: Binary protocols (protobuf, FlatBuffers)

3. Price Level Indexing

# For fixed-tick instruments, use array indexing
# E.g., BTC with $1 ticks: price $50,000 = index 50000
class TickBook:
    def __init__(self, min_price, max_price, tick_size):
        self.min_price = min_price
        self.tick_size = tick_size
        n_levels = int((max_price - min_price) / tick_size)
        self.bids = np.zeros(n_levels, dtype=np.float64)
        self.asks = np.zeros(n_levels, dtype=np.float64)
    
    def _price_to_idx(self, price):
        return int((price - self.min_price) / self.tick_size)
    
    def update_bid(self, price, qty):
        self.bids[self._price_to_idx(price)] = qty  # O(1)!

Common Misconceptions

Myth: “I can just poll the REST API for order book state.”
Reality: REST is seconds behind WebSocket. By the time you get the response, it’s ancient history. WebSocket with incremental updates is mandatory.

Myth: “Sequence gaps are rare, I can ignore them.”
Reality: Networks drop packets. WebSocket connections hiccup. Gaps happen in production-your code MUST handle them or your book diverges silently.

Myth: “My order book matches the exchange perfectly.”
Reality: There’s always propagation delay. Your book is a local approximation. Build strategies that tolerate this uncertainty.


Exchange Integration Examples

Binance

# Subscribe to order book updates
{
    "method": "SUBSCRIBE",
    "params": ["btcusdt@depth@100ms"],  # 100ms batching
    "id": 1
}

# Snapshot endpoint for recovery
GET /api/v3/depth?symbol=BTCUSDT&limit=1000

Coinbase

# Subscribe
{
    "type": "subscribe",
    "channels": [{"name": "level2", "product_ids": ["BTC-USD"]}]
}

# Initial snapshot comes first, then updates

Practice Exercises

Exercise 1: Build a Simple Book

# Process this sequence:
snapshot = {"bids": [["100", "10"]], "asks": [["101", "5"]], "sequence": 1}
update1 = {"bids": [["99", "20"]], "asks": [], "sequence": 2}
update2 = {"bids": [["100", "0"]], "asks": [["101", "8"]], "sequence": 3}

# What's the final state?

Exercise 2: Gap Detection

# What happens with this sequence?
snapshot = {"sequence": 100, ...}
update = {"sequence": 102, ...}  # Gap! 101 is missing

Exercise 3: Benchmark Performance

# Measure updates per second
import time
start = time.perf_counter()
for _ in range(100000):
    book.apply_update(sample_update)
elapsed = time.perf_counter() - start
print(f"{100000/elapsed:.0f} updates/sec")

Key Takeaways

  1. Event-sourcing model - Books are built from update streams
  2. Sequence numbers are critical - One missed update corrupts everything
  3. Buffer + snapshot = recovery - Handle gaps gracefully
  4. Performance is architecture - Data structures matter more than code optimization

What’s Next?

🎯 Continue learning: Market Making Fundamentals

🔬 Expert version: Order Book Reconstruction: Nanoseconds Matter

Now you can build order books that keep up with the market. 📊

Questions about this lesson? Working on related infrastructure?

Let's discuss