§ whyisthisdown // notes from the night shift

Observability, incidents,
and reliable systems — in practice.

Writing for engineering teams on observability, incident analysis, and building resilient systems. No frameworks-of-the-week — just what breaks and how to tell.

Read articles 8 articles published

Latest May 24, 2026

Sixteen Months

Sixteen months of relief — except the two soonest obligations are still seven months out, and Article 12's logging requirement is an architecture decision you can't defer. Brussels blinked. You can't.

[governance][eu-ai-act][observability]

9 min read read →

§ latest writing

see all →

May 04, 2026 27 min read ai-agents mcp governance

Nine Seconds

Nine seconds. One curl. A startup's production data and all its backups, gone. The agent wrote a confession afterward — and the confession was the least useful part. Same architecture failure, third time this year.
Apr 27, 2026 23 min read mcp governance security

The MCP Trust Deficit

Twenty-two thousand MCP servers. Zero mandatory security checks. The protocol won — the trust layer never shipped. An audit of what's actually exposed.
Apr 19, 2026 7 min read observability incidents governance

Three Ways Your API Lies: Lessons from GitHub's Rough Week

Four GitHub incidents in five days. Three are the same failure wearing different masks — stale caches, ghost state, retry storms. That pattern is probably in your stack too.
Apr 16, 2026 17 min read observability llm post-mortem

LLM-Assisted Post-Mortems: The Streetlight Effect, Industrialized

You pasted logs into ChatGPT and got a plausible RCA. It's wrong. What changes when your LLM can query the observability stack directly — and what new failure modes that creates.
Feb 09, 2026 9 min read mcp benchmarks performance

Benchmarking MCP Tool Calls: Three Findings That Aren't 'Parallel Is Faster'

5,300 measurements, 6 scenarios. The headline is 19.6× speedup — the real findings are that stdio isn't serial, framework overhead is zero, and a hardcoded constant was capping your throughput.
Jan 15, 2026 8 min read mcp governance security

The MCP Governance Problem Nobody's Talking About

Everyone's plugging unvetted MCP servers into production LLMs. Nobody's asking who's liable when they leak credentials or delete data. The governance gap enterprises are ignoring.

Observability, incidents,and reliable systems — in practice.

Sixteen Months

§ latest writing

Nine Seconds

The MCP Trust Deficit

Three Ways Your API Lies: Lessons from GitHub's Rough Week

LLM-Assisted Post-Mortems: The Streetlight Effect, Industrialized

Benchmarking MCP Tool Calls: Three Findings That Aren't 'Parallel Is Faster'

The MCP Governance Problem Nobody's Talking About

Observability, incidents,
and reliable systems — in practice.