System Design

Building a Crypto Exchange on AWS: The Architecture of Liquidity

Reference architecture for a high-performance crypto exchange on AWS, covering the Matching Engine (LMAX pattern), Market Data Ingest, and MPC Custody integration.

4 min
#aws #exchange #architecture #lmax #mpc #kafka #sbe

Building a crypto exchange is not building a web app. It is building a bank, a brokerage, and a stock exchange-simultaneously-on infrastructure you don’t own.

Most “white-label” exchange kits fail because they treat the Matching Engine as a database transaction. It is not. It is an in-memory state machine where every microsecond of added latency costs you market maker relationships.

This post documents the reference architecture for a Tier-1 exchange running on AWS.

1. The Three Pillars

An exchange has three core systems, each with different physics:

  1. The Firehose (Ingest): Market data from Binance, Coinbase, etc. Must handle 100k+ msgs/sec.
  2. The Core (Matching): The LMAX-style in-memory order book. Latency must be deterministic < 50µs.
  3. The Vault (Custody): Hot/Cold wallet hierarchy. Security > Speed.
Market Data Ingest (SBE) Kafka (Persist) Matching Engine (Ring Buffer) Execution Custody (MPC)

2. The Firehose: Pricing & Ingest

You need to ingest pricing from external exchanges to drive your internal Mark Price and Risk Engine. JSON is too slow.

The Physics of Normalization

We normalize all incoming data to SBE (Simple Binary Encoding).

  • JSON: 500ns to parse. Garbage Collection pressure.
  • SBE: 5ns to parse. Zero-copy. Cache-friendly.
<!-- SBE Schema Example -->
<sbe:message name="NewOrder" id="1">
    <field name="orderId" id="1" type="uint64"/>
    <field name="price" id="2" type="uint64"/>
    <field name="quantity" id="3" type="uint64"/>
    <field name="side" id="4" type="SideEnum"/>
</sbe:message>

Multicast in the Cloud

AWS VPCs do not support native L2 Multicast. To broadcast market data to internal consumers (Risk, Matching, Analytics), we use AWS Transit Gateway Multicast or a VXLAN Overlay.

3. The Core: The LMAX Disruptor

The Matching Engine cannot use a database. PostgreSQL locks take 1-2ms. We need microsecond precision. We use the LMAX Disruptor pattern: single-threaded, non-blocking, in-memory.

Ring Buffer Physics

Instead of queues (which require locks), we use a pre-allocated Ring Buffer.

  • Memory Barriers: We use Unsafe or C++ std::atomic to control visibility.
  • Cache Line Padding: Every entry in the ring buffer is padded to 64 bytes (L1 cache line size) to prevent False Sharing.

The AWS Setup

  • Instance: c6i.metal. We disable Hyper-threading and pin the Matching Loop to Core 0 (isolated).
  • State: The entire order book lives in RAM vectors.
  • Recovery: On crash, we replay the Input Event Stream (Kafka + S3 Snapshots) to rebuild the in-memory state.

The Decision Matrix: Journaling Storage

Storage for WALWrite LatencyDurabilityVerdict
A. EBS gp3~1msHighToo slow for journal.
B. Instance Store NVMe~50µsLow (ephemeral)Fast but risky.
C. FSx for Lustre~100µsHighSelected.

4. The Vault: Hot, Warm, and Cold Custody

Custody is the most dangerous part of the stack.

Withdrawal Flow

  1. User requests withdrawal.
  2. Risk Engine checks: KYC status, internal ledger balance, Chainalysis sanctions.
  3. Hot Wallet (MPC): If risk passes and amount < $10K, auto-sign via Fireblocks API.
  4. Cold Storage: If hot wallet is empty, human ritual (M-of-N keys) moves funds from cold to hot.

5. Systems Thinking: The Trade-offs

  1. Ledger Consistency: Your internal ledger (SQL) and the blockchain will drift. Run a reconciliation bot every 4 hours. If Assets - Liabilities < 0, halt withdrawals immediately.
  2. Regulatory Audit: MiFID II requires storing PCAPs of every order. Use AWS Traffic Mirroring on the Matching Engine ENI to dump raw packets to S3 for 7-year retention.
  3. Chaos Engineering: Test with AWS FIS. Your RTO must be <10 seconds. Your RPO must be zero (no lost orders).

6. The Philosophy

An exchange is a trust machine. Users deposit assets and trust you to return them. Your architecture must be worthy of that trust.

Every component-from the ingest layer to the custody vault-must be designed with the assumption that it will fail. The question is not “if” but “when” and “how fast can you recover?”

This is not a web app. This is critical infrastructure.


Need Help Designing Exchange Infrastructure?

Building a crypto exchange on AWS or other cloud providers? I help exchanges architect low-latency, highly available trading systems. Let’s discuss your architecture →

Share: LinkedIn X