A normal day was a few thousand requests an hour. A drop was a hundred thousand people arriving in the same second, all wanting the exact same row in the database. Traffic like that doesn’t ramp. It detonates.
Here’s what actually held.
Treat the spike as the default
The mistake is designing for the average and bolting on scale later. For a drop, the spike is the product. Everything quiet is just rehearsal.
So the question stopped being “can it handle normal load” and became “what happens in the first 800ms after the gate opens.”
Make the hot path boring
The fewer moving parts in the critical second, the better:
- Cache anything that isn’t the purchase itself, aggressively, at the edge.
- Keep the write that matters small, single-purpose, and idempotent.
- Put a queue in front of the thing that can’t be cached, and let people wait in a line that actually moves.
A queue feels slow. A crash feels like betrayal. People will forgive a line. They won’t forgive a white screen at the moment they were promised.
Fail in a way people trust
If it has to break, break loudly, early, and honestly.
A clear “you’re number 4,213 in line” beats a fast page that silently drops every other order. The first one keeps trust. The second one ends it.
The systems that survived weren’t the cleverest. They were the ones that stayed honest under pressure.