Serialization & Deserialization
JSON vs Protobuf. Schema validation. The gotchas that bite in production.
The Core Problem
Data in your program lives in memory as rich objects — with types, methods, references. Networks transmit raw bytes. Serialization converts in-memory objects → bytes. Deserialization converts bytes → objects.
This happens on every single API call:
1. Client serializes request body to JSON bytes
2. Server deserializes JSON bytes to language objects
3. Server processes, then serializes result back to JSON bytes
4. Client deserializes the response
Understanding this loop tells you where bugs, performance issues, and security vulnerabilities hide.
JSON — The Default
JSON (JavaScript Object Notation) is the universal backend format. Human-readable, language-agnostic, supported everywhere.
Supported types:
• string, number, boolean, null
• object (key-value pairs)
• array
What JSON can't represent:
• Dates (send as ISO 8601 strings: "2024-01-15T10:30:00Z")
• Binary data (base64 encode it)
• undefined (omit the key or use null)
• Circular references (will throw)
JSON parsing is a security surface: always validate structure before trusting keys.
JSON vs Other Formats
JSON — Universal, human-readable, slightly verbose. Default for REST APIs.
XML — Verbose, old, still used in enterprise/SOAP. Avoid for new systems.
Protocol Buffers (Protobuf) — Binary format by Google. Typed schema (.proto files). 3-10x smaller, 5-10x faster than JSON. Used in gRPC and internal microservices. Not human-readable.
MessagePack — Binary JSON. Same structure as JSON, smaller payload. Good middle ground.
CBOR — Concise Binary Object Representation. Used in IoT, certificates.
Rule: Use JSON for external APIs. Use Protobuf/MessagePack for internal high-throughput services where performance matters.
Schema Validation on Deserialization
Never trust what the client sends. After deserializing, validate:
- Required fields are present
- Types are correct (string, not number)
- Values are in allowed ranges
- Strings match expected patterns (email, UUID)
- Arrays don't exceed max length
- No unexpected extra fields (strip or reject)
Tools: Zod (TypeScript), Pydantic (Python), class-validator (NestJS), JSON Schema.
Example with Zod:
const schema = z.object({
email: z.string().email(),
age: z.number().int().min(18).max(120),
role: z.enum(["user", "admin"]),
});
const data = schema.parse(req.body); // throws if invalid
Serialization Gotchas
1. Date handling: Store as UTC, serialize as ISO 8601. Never serialize as timestamp integers in public APIs (ambiguous ms vs seconds).
2. Large numbers: JSON numbers lose precision beyond 2^53. Send large IDs as strings.
3. Null vs undefined vs missing: Decide a convention and stick to it. "null" means "intentionally empty." Missing key means "not provided."
4. Circular references: Will crash your serializer. Use a DAG (directed acyclic graph) structure or break cycles explicitly.
5. Sensitive fields: Strip passwords, tokens, and internal IDs from responses. Use a response DTO separate from your database model.
The Backend from First Principles series is based on what I learnt from Sriniously's YouTube playlist — a thoughtful, framework-agnostic walk through backend engineering. If this material helped you, please go check the original out: youtube.com/@Sriniously. The notes here are my own restatement for revisiting later.