Scaling & Performance
Vertical vs horizontal. Load balancing. Read replicas. The optimization checklist.
Vertical vs Horizontal Scaling
Vertical scaling (scale up) — Give the server more CPU, RAM, disk.
Simple. No code changes. But: there's a limit (biggest instance size), single point of failure, expensive.
Horizontal scaling (scale out) — Add more servers. Run multiple instances behind a load balancer.
Requires stateless architecture (no local state). Much higher ceiling. Handles failures better.
The combination: start vertical, then scale horizontal when hitting the ceiling.
Rule: Make your service stateless first. State (sessions, cache) goes to external services (Redis, DB). Then horizontal scaling is trivial.
Load Balancing
A load balancer distributes requests across multiple backend instances.
Algorithms:
• Round Robin — each server in turn. Simple, even distribution.
• Least Connections — send to server with fewest active connections.
• IP Hash — same client always goes to same server (sticky sessions).
• Weighted Round Robin — servers get proportional to their capacity.
• Random — surprisingly effective for stateless services.
Layer 4 (TCP) vs Layer 7 (HTTP):
• L4: Routes based on IP/TCP. Fast, no HTTP knowledge.
• L7: Routes based on HTTP (headers, path, cookies). Can route /api → backend, /static → CDN.
Tools: Nginx, HAProxy, AWS ALB, Envoy, Traefik.
Sticky sessions: when a user must always hit the same server (avoid if possible — breaks horizontal scaling).
Database Scaling
Read replicas — Add read-only copies of the database. Write to primary, read from replicas. Scale reads horizontally. (Most apps are 90% reads.)
Connection pooling — PgBouncer between app and DB. Multiplexes thousands of app connections into a few DB connections.
Query optimization — Indexes, avoiding N+1, proper EXPLAIN ANALYZE usage. Often the first step before infrastructure scaling.
Partitioning (Sharding) — Split data across multiple databases by a shard key (userId, region). Complex to implement. Use only when other options exhausted.
CQRS — Command Query Responsibility Segregation. Separate write model (commands) from read model (queries). Different data stores optimized for each.
Read/write splitting: Route queries to replicas automatically via middleware or ORM config.
Performance Optimization Checklist
In order of ROI (do first what yields most):
- Database indexes — Index every foreign key and common WHERE column.
- N+1 query elimination — Use JOINs or batch loading.
- Caching — Redis in front of expensive queries.
- Connection pooling — PgBouncer, Redis connection pool.
- Async where possible — Non-blocking I/O, background jobs.
- Pagination — Never return unlimited lists.
- Response compression — gzip/brotli on all text responses.
- HTTP/2 — Multiplexing cuts latency for many small requests.
- CDN — Static assets and cacheable API responses at the edge.
- Read replicas — Scale reads horizontally.
- Horizontal scaling — Add instances behind load balancer.
- Sharding — Last resort for data too large for one server.
The Backend from First Principles series is based on what I learnt from Sriniously's YouTube playlist — a thoughtful, framework-agnostic walk through backend engineering. If this material helped you, please go check the original out: youtube.com/@Sriniously. The notes here are my own restatement for revisiting later.