Module 10 • Data • 16 min read

Databases

ACID, indexing, N+1, migrations, connection pooling — the database fundamentals.

Relational vs Non-Relational

Relational (SQL) — PostgreSQL, MySQL, SQLite
• Data in tables with rows and columns
• Strong schema enforcement
• ACID transactions
• Powerful JOIN queries
• Best for: structured data with relationships, financial systems, anything needing consistency

Document (NoSQL) — MongoDB, DynamoDB
• Data as JSON documents in collections
• Flexible schema
• Horizontal scaling built-in
• Best for: varying structures, hierarchical data, high write throughput

Others:
• Key-Value: Redis, DynamoDB (caching, sessions)
• Wide-Column: Cassandra (time-series, IoT)
• Graph: Neo4j (social networks, recommendation engines)
• Time-Series: InfluxDB, TimescaleDB (metrics, monitoring)

Most apps: PostgreSQL + Redis. Add others only for specific needs.

ACID Transactions

ACID guarantees that database operations are reliable:

Atomicity — Either ALL operations in a transaction succeed, or NONE do. No partial updates.
Example: Transfer $100 from Alice to Bob. Debit Alice AND credit Bob must both succeed or both fail.

Consistency — The database is always in a valid state. Constraints (foreign keys, unique indexes) are enforced.

Isolation — Concurrent transactions don't see each other's partial work. Multiple users editing simultaneously don't corrupt each other's data.

Durability — Once committed, data survives crashes. Written to disk, not just memory.

Isolation levels (from weakest to strongest):
Read Uncommitted → Read Committed → Repeatable Read → Serializable

PostgreSQL default: Read Committed. For financial operations: Serializable.

Indexing

Without an index, a query does a full table scan — O(n) where n is row count. An index is a sorted data structure (B-Tree) that allows O(log n) lookups.

Always index:
• Primary keys (automatic)
• Foreign keys
• Columns used in WHERE clauses
• Columns used in JOIN conditions
• Columns used in ORDER BY (if queried frequently)

Composite indexes: index on (a, b) helps queries on a alone or a+b together, NOT b alone. Order in the index matters.

Partial indexes: index WHERE status = 'pending' — only indexes a subset of rows.

Unique indexes: enforce uniqueness at the database level (not just application level).

Cost: Indexes speed up reads but slow down writes (must update the index). Too many indexes on a write-heavy table is a problem.

N+1 Query Problem

The most common ORM pitfall:

JavaScript

// Bad: N+1 queries
const users = await db.users.findMany(); // 1 query
for (const user of users) {
  user.posts = await db.posts.findMany({ where: { userId: user.id } }); // N queries
}
// If 100 users → 101 queries to the database!

// Good: 2 queries (or 1 JOIN)
const users = await db.users.findMany({
  include: { posts: true }  // JOIN or batch load in one query
});

OR use a JOIN:

SQL

SELECT users.*, posts.*
FROM users
LEFT JOIN posts ON posts.user_id = users.id
WHERE users.id IN (1, 2, 3);

Always monitor query count in development. ORMs make N+1 dangerously easy to write.

Migrations

A migration is a versioned, ordered script that changes the database schema. It's the version control for your database.

Never modify the database manually in production. Write a migration:

SQL

-- 20240115_add_user_phone.sql
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
CREATE INDEX idx_users_phone ON users(phone);

Migration tools: Flyway, Liquibase, Alembic (Python), golang-migrate, Prisma Migrate.

Rules:
• Migrations are immutable — never edit a committed migration
• Make migrations reversible when possible (include rollback)
• Run migrations before deploying new code
• In CI/CD, apply migrations as part of deployment
• Never drop columns immediately — mark deprecated, then remove after old code is gone

Connection Pooling

Opening a database connection is expensive (~100ms). Creating a new one per request would kill performance.

A connection pool maintains a set of pre-opened connections. When your code needs to query, it borrows one from the pool, uses it, and returns it.

Configuration that matters:
• min connections: always keep this many open (2-5 typical)
• max connections: cap on simultaneous connections (match DB max_connections)
• acquire timeout: how long to wait for a connection before throwing an error
• idle timeout: close connections unused for this long

PostgreSQL max_connections default: 100. With 10 app servers, max pool size = 9 each (leave headroom for admin).

Use PgBouncer as a connection pooler between your app and PostgreSQL in high-concurrency setups.

★

Source & Credit

The Backend from First Principles series is based on what I learnt from Sriniously's YouTube playlist — a thoughtful, framework-agnostic walk through backend engineering. If this material helped you, please go check the original out: youtube.com/@Sriniously. The notes here are my own restatement for revisiting later.

⁂ Back to all modules