Crypto Exchange Architecture on AWS
Building a crypto exchange infrastructure on AWS. VPC design, security groups, HSM integration, and disaster recovery.
🎯 What You'll Learn
- Design a secure VPC for exchange infrastructure
- Implement proper security group rules
- Integrate AWS HSM for key management
- Plan for disaster recovery and failover
📚 Prerequisites
Before this lesson, you should understand:
Why AWS for Crypto Exchanges?
Despite latency disadvantages, many crypto exchanges use AWS because:
- Fast iteration - Go live in days, not months
- Security certifications - SOC2, ISO27001 out of the box
- Global presence - Regions near major crypto markets
- Managed services - Less operational burden
This lesson covers architecture patterns for exchange infrastructure on AWS.
What You’ll Learn
By the end of this lesson, you’ll understand:
- VPC architecture - Network isolation and segmentation
- Security groups - Principle of least privilege
- Key management - HSM integration for crypto operations
- Disaster recovery - Multi-region failover strategies
The Foundation: VPC Design
A proper exchange VPC has multiple layers:
┌─────────────────────────────────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Public Subnet (10.0.1.0/24) │ │
│ │ [ALB] [NAT Gateway] [Bastion] │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Private Subnet - App (10.0.2.0/24) │ │
│ │ [API Servers] [Matching Engine] [Order Manager] │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Private Subnet - Data (10.0.3.0/24) │ │
│ │ [RDS] [ElastiCache] [DocumentDB] │ │
│ └────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Private Subnet - HSM (10.0.4.0/24) │ │
│ │ [CloudHSM] [Key Management] │ │
│ └────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
The “Aha!” Moment
Here’s what separates secure exchanges from hacked ones:
The matching engine should NEVER be directly accessible from the internet. All external traffic goes through API gateways in the public subnet. The matching engine lives in a private subnet with NO inbound rules except from the API layer.
Network segmentation is your first defense.
Let’s See It In Action: Terraform VPC
# exchange-vpc.tf
resource "aws_vpc" "exchange" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "exchange-vpc"
Environment = "production"
}
}
# Public subnet for ALB, NAT, Bastion
resource "aws_subnet" "public" {
vpc_id = aws_vpc.exchange.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
map_public_ip_on_launch = true
}
# Private subnet for application servers
resource "aws_subnet" "app" {
vpc_id = aws_vpc.exchange.id
cidr_block = "10.0.2.0/24"
availability_zone = "us-east-1a"
}
# Private subnet for databases
resource "aws_subnet" "data" {
vpc_id = aws_vpc.exchange.id
cidr_block = "10.0.3.0/24"
availability_zone = "us-east-1a"
}
# Private subnet for HSM
resource "aws_subnet" "hsm" {
vpc_id = aws_vpc.exchange.id
cidr_block = "10.0.4.0/24"
availability_zone = "us-east-1a"
}
Security Groups: Least Privilege
# ALB security group - only public entry point
resource "aws_security_group" "alb" {
name = "exchange-alb"
description = "Allow HTTPS from internet"
vpc_id = aws_vpc.exchange.id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# App server - only from ALB
resource "aws_security_group" "app" {
name = "exchange-app"
description = "API servers"
vpc_id = aws_vpc.exchange.id
ingress {
from_port = 8080
to_port = 8080
protocol = "tcp"
security_groups = [aws_security_group.alb.id] # Only ALB!
}
}
# Matching engine - only from app servers
resource "aws_security_group" "matching" {
name = "exchange-matching"
description = "Matching engine - no internet access"
vpc_id = aws_vpc.exchange.id
ingress {
from_port = 9000
to_port = 9000
protocol = "tcp"
security_groups = [aws_security_group.app.id] # Only app servers!
}
# No egress to internet
egress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.rds.id] # Only DB
}
}
CloudHSM for Key Management
Crypto exchanges need HSM for:
- Hot wallet signing keys
- API key encryption
- User password hashing
# Simplified CloudHSM integration
import boto3
from cloudhsm_pkcs11 import sign_transaction
class SecureWallet:
def __init__(self, hsm_cluster_id: str):
self.client = boto3.client('cloudhsmv2')
self.cluster_id = hsm_cluster_id
def sign_withdrawal(self, transaction: bytes, key_label: str) -> bytes:
"""Sign transaction using key stored in HSM."""
# Key never leaves the HSM
signature = sign_transaction(
pkcs11_library="/opt/cloudhsm/lib/libcloudhsm_pkcs11.so",
pin=os.environ['HSM_PIN'],
key_label=key_label,
data=transaction
)
return signature
Cost: ~$5,000/month for CloudHSM cluster (2 HSMs minimum for HA)
Common Misconceptions
Myth: “Security groups are like firewalls-set once and forget.”
Reality: Security groups should be audited monthly. Developers add rules for debugging and forget to remove them. Use AWS Config to detect violations.
Myth: “Multi-AZ RDS is enough for disaster recovery.”
Reality: Multi-AZ protects against AZ failure, not region failure. For a crypto exchange, you need cross-region replication and a DR runbook.
Myth: “AWS manages security, so I don’t need to.”
Reality: AWS secures the infrastructure; you secure the configuration. Most breaches are misconfigured S3 buckets or overly permissive security groups.
Disaster Recovery Strategy
| Tier | RTO | RPO | Strategy | Cost |
|---|---|---|---|---|
| Backup & Restore | Hours | Hours | S3 cross-region | $ |
| Pilot Light | Minutes | Seconds | Standby DB in DR region | $$ |
| Warm Standby | Seconds | Seconds | Scaled-down DR region | $$$ |
| Active-Active | 0 | 0 | Full production in 2 regions | $$$$ |
For exchanges, Warm Standby minimum. Active-Active for serious operations.
High-Availability Architecture
Region: us-east-1 Region: us-west-2 (DR)
┌─────────────────────┐ ┌─────────────────────┐
│ [ALB] ─── [API] │ │ [ALB] ─── [API] │
│ ├── [Match] │ ────────▶ │ ├── [Match] │
│ └── [RDS-Pri] │ Replication│ └── [RDS-Read] │
└─────────────────────┘ └─────────────────────┘
│ │
└──────── Route 53 Health Checks ─────┘
Practice Exercises
Exercise 1: Draw Your VPC
Create a VPC diagram for your exchange:
- How many subnets?
- What goes in each?
- Which can reach the internet?
Exercise 2: Security Group Audit
For each security group, answer:
1. Who/what can initiate connections?
2. To what ports?
3. Why?
If you can't answer "why," the rule shouldn't exist.
Exercise 3: DR Runbook
Write steps for region failover:
1. How do you detect the outage?
2. How do you fail over DNS?
3. How do you promote the DR database?
4. How do you fail back?
Key Takeaways
- Network segmentation is fundamental - Public, app, data, HSM subnets
- Security groups = least privilege - Only what’s needed, nothing more
- HSM for critical keys - Signing keys never leave hardware
- Plan for failure - DR strategy before you need it
What’s Next?
🎯 Continue learning: Security Architecture for Trading
🔬 Expert version: Building a Crypto Exchange on AWS
Now you can architect secure exchange infrastructure. 🏗️
Questions about this lesson? Working on related infrastructure?
Let's discuss