Hello, World
Welcome to Zero Downtime. This is a placeholder post used to validate the
Phase 3 layout system: typography, post metadata, code highlighting, lists,
and the <!--more--> excerpt split.
What I’ll write about
I work on systems that have to stay up — distributed databases, message queues, deployment pipelines, the operational glue around all of it. Posts here will mostly cover:
- Failure analysis — postmortems abstracted of names and blame.
- Reliability patterns — backpressure, idempotency, exactly-once illusions, and where their assumptions break.
- Tooling — small notes on instruments I find indispensable.
A code sample, for testing
def with_retry(fn, attempts=3, backoff=0.2):
for i in range(attempts):
try:
return fn()
except TransientError:
if i == attempts - 1:
raise
time.sleep(backoff * (2 ** i))
A short inline code snippet for good measure.
A blockquote
Anything that can go wrong, will go wrong — at exactly the moment your on-call rotation hands off.
That’s all for now.