The naive approach to schema migration is: stop the service, run the migration, start the service. This works fine for a personal project. In production, “stop the service” means downtime, and downtime means angry users and broken SLAs.
Zero-downtime migration is a discipline, not a single technique. It requires thinking about the transition window — the period where old code and new code run simultaneously against the same database.
The Expand / Contract Pattern
Every zero-downtime migration follows the same shape:
- Expand — add new structure without removing old structure
- Migrate — backfill data, dual-write
- Contract — remove the old structure once all code uses the new
This can span multiple deployments. That’s fine. Schema cleanup is not urgent.
Renaming a Column Safely
Renaming users.name to users.full_name seems trivial. It is not.
Step 1 — Expand: add the new column
ALTER TABLE users ADD COLUMN full_name TEXT; Deploy code that writes to both name and full_name. Reads still use name.
Step 2 — Backfill existing rows
UPDATE users
SET full_name = name
WHERE full_name IS NULL; Do this in batches on large tables. A single UPDATE on millions of rows holds a lock for its entire duration:
DO $$
DECLARE
batch_size INT := 5000;
updated INT;
BEGIN
LOOP
UPDATE users
SET full_name = name
WHERE id IN (
SELECT id FROM users
WHERE full_name IS NULL
LIMIT batch_size
);
GET DIAGNOSTICS updated = ROW_COUNT;
EXIT WHEN updated = 0;
PERFORM pg_sleep(0.05); -- breathe
END LOOP;
END $$; Step 3 — Flip reads
Deploy code that reads full_name. Keep writing to both columns.
Step 4 — Contract: drop the old column
ALTER TABLE users DROP COLUMN name; Deploy code that stops writing to name.
Adding a NOT NULL Column
PostgreSQL rewrites the entire table when you add a NOT NULL column without a default on versions before 11. On modern PostgreSQL this is handled via a metadata-only change if you supply a constant default — but adding a computed or function-based default still triggers a rewrite.
Safe pattern:
-- 1. Add nullable
ALTER TABLE orders ADD COLUMN processed_at TIMESTAMPTZ;
-- 2. Backfill
UPDATE orders SET processed_at = created_at WHERE processed_at IS NULL;
-- 3. Enforce constraint (fast on PG 12+ with NOT VALID)
ALTER TABLE orders
ADD CONSTRAINT orders_processed_at_not_null
CHECK (processed_at IS NOT NULL) NOT VALID;
-- 4. Validate in the background (no full lock)
ALTER TABLE orders
VALIDATE CONSTRAINT orders_processed_at_not_null; Indexing Without Locking
CREATE INDEX takes a ShareLock that blocks all writes. Use CONCURRENTLY:
CREATE INDEX CONCURRENTLY idx_orders_user_id ON orders(user_id); This takes longer but only takes a weaker lock that allows concurrent reads and writes. The trade-off: if it fails, you’re left with an INVALID index that must be dropped and rebuilt.
The Golden Rules
- Never remove a column in the same deployment that stops writing to it. Old pods may still be running.
- Backfill in batches. Large updates hold locks proportionally.
- Use
NOT VALID+VALIDATE CONSTRAINTfor deferred enforcement. CREATE INDEX CONCURRENTLY— always.- Test migrations against a production-sized dataset in staging, not a 10-row dev database.