Databases
ACID, indexing, N+1, migrations, connection pooling — the database fundamentals.
Relational vs Non-Relational
Relational (SQL) — PostgreSQL, MySQL, SQLite
• Data in tables with rows and columns
• Strong schema enforcement
• ACID transactions
• Powerful JOIN queries
• Best for: structured data with relationships, financial systems, anything needing consistency
Document (NoSQL) — MongoDB, DynamoDB
• Data as JSON documents in collections
• Flexible schema
• Horizontal scaling built-in
• Best for: varying structures, hierarchical data, high write throughput
Others:
• Key-Value: Redis, DynamoDB (caching, sessions)
• Wide-Column: Cassandra (time-series, IoT)
• Graph: Neo4j (social networks, recommendation engines)
• Time-Series: InfluxDB, TimescaleDB (metrics, monitoring)
Most apps: PostgreSQL + Redis. Add others only for specific needs.
ACID Transactions
ACID guarantees that database operations are reliable:
Atomicity — Either ALL operations in a transaction succeed, or NONE do. No partial updates.
Example: Transfer $100 from Alice to Bob. Debit Alice AND credit Bob must both succeed or both fail.
Consistency — The database is always in a valid state. Constraints (foreign keys, unique indexes) are enforced.
Isolation — Concurrent transactions don't see each other's partial work. Multiple users editing simultaneously don't corrupt each other's data.
Durability — Once committed, data survives crashes. Written to disk, not just memory.
Isolation levels (from weakest to strongest):
Read Uncommitted → Read Committed → Repeatable Read → Serializable
PostgreSQL default: Read Committed. For financial operations: Serializable.
Indexing
Without an index, a query does a full table scan — O(n) where n is row count. An index is a sorted data structure (B-Tree) that allows O(log n) lookups.
Always index:
• Primary keys (automatic)
• Foreign keys
• Columns used in WHERE clauses
• Columns used in JOIN conditions
• Columns used in ORDER BY (if queried frequently)
Composite indexes: index on (a, b) helps queries on a alone or a+b together, NOT b alone. Order in the index matters.
Partial indexes: index WHERE status = 'pending' — only indexes a subset of rows.
Unique indexes: enforce uniqueness at the database level (not just application level).
Cost: Indexes speed up reads but slow down writes (must update the index). Too many indexes on a write-heavy table is a problem.
N+1 Query Problem
The most common ORM pitfall:
// Bad: N+1 queries
const users = await db.users.findMany(); // 1 query
for (const user of users) {
user.posts = await db.posts.findMany({ where: { userId: user.id } }); // N queries
}
// If 100 users → 101 queries to the database!
// Good: 2 queries (or 1 JOIN)
const users = await db.users.findMany({
include: { posts: true } // JOIN or batch load in one query
});
OR use a JOIN:
SELECT users.*, posts.*
FROM users
LEFT JOIN posts ON posts.user_id = users.id
WHERE users.id IN (1, 2, 3);
Always monitor query count in development. ORMs make N+1 dangerously easy to write.
Migrations
A migration is a versioned, ordered script that changes the database schema. It's the version control for your database.
Never modify the database manually in production. Write a migration:
-- 20240115_add_user_phone.sql
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
CREATE INDEX idx_users_phone ON users(phone);
Migration tools: Flyway, Liquibase, Alembic (Python), golang-migrate, Prisma Migrate.
Rules:
• Migrations are immutable — never edit a committed migration
• Make migrations reversible when possible (include rollback)
• Run migrations before deploying new code
• In CI/CD, apply migrations as part of deployment
• Never drop columns immediately — mark deprecated, then remove after old code is gone
Connection Pooling
Opening a database connection is expensive (~100ms). Creating a new one per request would kill performance.
A connection pool maintains a set of pre-opened connections. When your code needs to query, it borrows one from the pool, uses it, and returns it.
Configuration that matters:
• min connections: always keep this many open (2-5 typical)
• max connections: cap on simultaneous connections (match DB max_connections)
• acquire timeout: how long to wait for a connection before throwing an error
• idle timeout: close connections unused for this long
PostgreSQL max_connections default: 100. With 10 app servers, max pool size = 9 each (leave headroom for admin).
Use PgBouncer as a connection pooler between your app and PostgreSQL in high-concurrency setups.
The Backend from First Principles series is based on what I learnt from Sriniously's YouTube playlist — a thoughtful, framework-agnostic walk through backend engineering. If this material helped you, please go check the original out: youtube.com/@Sriniously. The notes here are my own restatement for revisiting later.