The Agent With No Flight Recorder

When an aircraft falls out of the sky, the first thing anyone wants is the box. Not the wreckage, not the manifest, not the press statement — the orange box that recorded what the systems were doing in the seconds before everything went wrong. We treat the flight recorder as non-negotiable infrastructure because, decades ago, we decided that “we don’t know what happened” is an unacceptable answer when the stakes are high enough.

We have not decided that about AI agents. OpenClaw is the proof. And in May, the EU — which was about to force the decision — agreed to wait until December 2027 instead.

I’ve been keeping a running file on this failure. Nine Seconds was one agent, one startup, one production database and its backups gone before the founder could stand up. The MCP Trust Deficit was the supply chain underneath it — twenty-two thousand servers, no mandatory trust layer. OpenClaw is the same failure as both, scaled to a hundred thousand deployments, and it is conspicuously missing from everyone’s incident log because it got filed under the wrong heading.

The object, not the news

Start with the artifact, because the news cycle did it a disservice by treating it as a curiosity.

OpenClaw shipped in November 2025 as Clawdbot — a self-hosted, tool-using personal agent marketed, with admirable honesty, as “the AI that actually does things.” It ran locally against a model of your choice and wired itself into the surfaces where your life already lives: WhatsApp, Telegram, Signal, Discord, Slack, your calendar, your inbox, your browser, your shell. It read and wrote files, submitted forms, scheduled things, spent money, executed OS commands. And it remembered — persistent memory across sessions was a headline feature, not a footnote.

It went off like a flare. Inside three months it was the most-starred project on GitHub, having sailed past React on the way up. There was a reported run on Mac minis in US retail. A subculture of community-published “skills” appeared almost overnight. Anthropic, unamused by the proximity to “Claude,” sent a trademark demand; the project became Moltbot on January 27th and OpenClaw on the 29th. Somewhere in that window the project’s account on X was reportedly hijacked to push a crypto scam. The renaming saga is funny. The rest of January was not.

By the end of the month the disclosures were stacking:

CVE-2026-25253 (CVSS 8.8): a one-click remote code execution. The control UI trusted a gatewayUrl parameter without validating it; a crafted link, clicked once, exfiltrated the user’s auth token over a hijacked WebSocket and handed an attacker full control of the host. It worked even on instances bound to localhost, because the victim’s own browser was the pivot. Public proof-of-concept followed on GitHub within days.
CVE-2026-24763 (CVSS 8.8): a Docker sandbox escape via PATH manipulation — the sandbox you’d reach for precisely to contain something this dangerous.
CVE-2026-25157 (CVSS 7.8): OS command injection through the macOS SSH handling.
The control interface bound to 0.0.0.0 by default. Scanners found exposed panels at internet scale — SecurityScorecard counted over 15,000, Censys put it above 21,000 by January 31st — many leaking API keys and private messages. Roughly 78% were running outdated builds, still branded Clawdbot or Moltbot, from before any of the patches.
Several hundred malicious entries in the community skill directory. Secrets leaking out of “Moltbook,” the agent-to-agent social feed bolted onto the ecosystem. Plaintext credential storage, promptly farmed by infostealer campaigns.

Patches landed in v2026.1.29. Most deployments did not take them, because the population that git clones an autonomous home server on a Saturday is not, as a rule, diligent about a Tuesday update cadence.

That reads like a security story. It isn’t, primarily, a security story — and the next three months proved it.

They patched it. It got worse.

By late April the January wave looked like a warm-up. Researchers at Cyera disclosed a chained set of flaws they called “Claw Chain,” patched only in v2026.4.22 on April 23rd. The worst of them, CVE-2026-44112, carried a CVSS of 9.6: a time-of-check/time-of-use race in the OpenShell sandbox that let an attacker rewrite system configuration, drop a backdoor, and hold persistent, host-level control. A second flaw in the same chain leaked API keys, tokens, and credentials through a logic error.

Hold the sequence, because the sequence is the point. January: token theft, sandbox escape, command injection — patched, with hardening guidance plastered across every security blog in the industry. Three months of “follow these six steps.” Then April: a 9.6 sandbox escape, more credential leakage, persistent control of the host — landing on a population that had been told exactly what to fix and had, by and large, fixed it.

This is not a story about a project that was insecure and then got secure. It’s a story about a project that was patched diligently along one axis and stayed wide open along another — and an industry that kept aiming at the axis it could see.

Why this was a visibility problem from line one

Simon Willison gave us the vocabulary in July 2025: the lethal trifecta. An agent turns dangerous when it holds, at once, access to private data, exposure to untrusted content, and the ability to communicate externally. Hold all three and you’ve built an exfiltration machine that takes instructions from strangers. OpenClaw didn’t brush against the trifecta. It was the trifecta, architected on purpose, then handed a shell and a memory.

Add the persistent memory and the cross-application autonomy and you get the failure mode that should keep anyone running this awake: the delayed multi-turn attack chain. A payload arrives inside a “good morning” forwarded on WhatsApp. It does nothing visible. It settles into memory. Three sessions later, in a different context, it shapes a plan, and the agent — acting on what reads internally as its own legitimate intent — moves data somewhere it shouldn’t. The guardrails never fire, because guardrails inspect the turn in front of them. They are spell-check for a single sentence. The attack is written across the whole novel.

Here is the part no CVE captures, and the part that matters. Suppose one of those 21,000 exposed instances was compromised through a memory-seeded chain. Could the operator reconstruct what it did? Walk the sequence back: which external content seeded the state, which tool calls fired and in what order, which were the agent’s own reasoning and which were planted, what left the perimeter and where it went.

For the overwhelming majority of these deployments the answer is not “it’s hard.” It’s no. There was no event log built for forensic reconstruction, no tamper-evident record of tool invocations, no causal trail linking ingested content to downstream action. The agent flew, the agent crashed, and there was no box.

This is the same shape as the PocketOS deletion: the most legible artifact of the incident — there, a contrite confession; here, a green dashboard — was also the most misleading, and the actual failure was set in the architecture before the agent ran a token. The difference is only scale. One company lost a database. A hundred thousand deployments lost the ability to say what their agents had touched, and almost none of them noticed, because nothing they were watching went red.

The industry brought a flashlight to a forensics problem

I’ve written before about the streetlight effect — the bias toward searching where the light is good rather than where the keys are. The agentic-security market spent the first half of 2026 staging a flawless, large-scale demonstration of it.

The response to OpenClaw, and to MCP more broadly, was a CVE firehose: more than forty MCP-ecosystem CVEs filed between January and April, roughly one every four days. Every vendor shipped a hardening checklist. Close the public bind. Sandbox the runtime. Rotate the keys. Patch to the latest tag. All correct, all necessary, all aimed squarely at the lit pavement — the entry points, which are static, enumerable, and gratifying to count.

None of it answers the forensic question, which is why the April wave landed on a population that had already done the homework. A patched, sandboxed, key-rotated agent with a clean CVE scorecard can still take a poisoned instruction from your calendar and act on it next Thursday — and you will still be unable to reconstruct the chain afterward, because hardening the entrance is a different discipline from recording the behavior.

I laid out the structural version of this in The MCP Trust Deficit: the gateway market converged on application-layer proxying — auth, rate limiting, schema validation, prompt-injection classifiers — and priced that shape as a complete answer. It catches what arrives at the door. It does not baseline behavior, enforce declared scope at the network boundary, or produce an evidentiary record of what the agent actually did over time. That last layer — tamper-evident, event-sourced audit — is the one that answers the OpenClaw question, and it is the one almost nobody shipped. The light is on over the CVE list. The keys are in the dark, at the level of behavior across time.

The EU just moved the deadline. That’s the trap.

Here is where the story takes its most dangerous turn, and it’s not a technical one.

Until May, August 2nd 2026 was going to force the issue. Then, on the 7th, Council and Parliament reached a provisional agreement on the Digital Omnibus and pushed the high-risk obligations back: standalone Annex III systems now apply from 2 December 2027, embedded Annex I systems from 2 August 2028. The two things landing soonest are the watermarking transition (2 December 2026) and the new Article 5 ban on AI-generated CSAM and non-consensual imagery — neither of which is the logging requirement. The deal is still provisional, pending formal adoption, but every serious party is planning against those dates.

I made the full argument in Sixteen Months, so I won’t re-run it here. The compressed version: Article 12’s requirement that a high-risk system “technically allow for the automatic recording of events over the lifetime of the system” is an architecture decision, not a deadline. Correlation IDs have to exist at write time. Tamper-evidence has to wrap the write path from the first commit. None of it can be reconstructed in 2027 from a SIEM full of JSON nobody designed to be correlated. The deadline moved; the architecture didn’t.

OpenClaw is what that argument looks like with the safety rail removed. Sixteen extra months is precisely enough time to convince yourself the recorder is a 2027 problem, ship a hundred thousand more agents that cannot account for themselves, and find out during the first audit — or the first breach — that the write path was load-bearing all along. The regulation that would have made the box mandatory just became optional again for a year and a half. The physics of building one did not change.

“Observable” is not “monitored,” and the difference is the whole point

Strip away the regulation and the engineering claim stands on its own.

Monitored is the dashboard telling you whether the thing is up, slow, or erroring. Latency, throughput, error rate, requests per second. Necessary, table stakes, and almost entirely beside the point for an agent. An OpenClaw instance quietly exfiltrating an inbox through a memory-seeded chain is, by every operational metric, in perfect health. Green across the board. The dashboard has no opinion about intent — only about liveness.

Observable, for an agent, means you can answer a question you didn’t know to ask in advance, about behavior, after the fact. That needs a different substrate: the causal trail (which ingested content preceded which plan, which plan produced which tool calls, in what order — lineage, not metrics); tool invocation captured as a first-class event, with caller, arguments, cost, and result, rather than smeared incidentally across application logs; behavioral baselining over sequences, because the dangerous pattern is rarely one bad call but a run of authorized calls composing into something nobody sanctioned; and tamper-evidence, because a log you can quietly edit after the incident is not evidence. Append-only, event-sourced, non-repudiable. That last property is where an architectural preference becomes a legal one.

This is the Governance & Auditability dimension of the observability maturity scorecard, and it sits independent of how comprehensively you trace for engineering reasons. The ability to produce that record is L3 on the axis; closed-loop enforcement — where the runtime can stop a deviation, not just witness it — is L4. OpenClaw scored at the floor, not because it lacked dashboards, but because it could not, even in principle, give an account of itself. It was monitored, briefly, by a hundred thousand people. It was observable by almost none of them.

Three questions for your next architecture review

The unit of governance for an agent is the sequence of actions. The unit of compliance is the record of that sequence. The CVE list, the hardening checklist, the prompt-level guardrail — all real work, all on a different layer than the one OpenClaw failed at. So:

Where does an agent in your stack take an action whose causal chain — what content seeded it, what fired in what order — you could not reconstruct a month later?
Where does your audit story depend on logs that nobody designed to be correlated, captured after the decision instead of at it?
Where have you read a green dashboard as evidence the agent behaved, when all it ever measured was that the agent was alive?

Answer those honestly and the sixteen-month reprieve is runway. Answer them in 2027 and it was a countdown you started late.

OpenClaw will be remembered as a security story, because the CVEs are vivid, the renaming saga is funny, and 21,000 exposed control panels makes a clean headline. That memory is incomplete. We built an agent that could do almost anything, deployed it a hundred thousand times, patched it twice when it broke — and built nothing capable of telling us, afterward, what it had actually done. The wreckage was everywhere. The box was nowhere.

Aviation learned this the expensive way and made the box mandatory. The EU was about to. Then it gave everyone an extra sixteen months to decide whether they’d rather install one before the audit, or after the crash.

“A green dashboard is not a flight recorder. It only ever told you the engines were still turning — never where the plane was going.”

Disclosure: I build MCP Hangar in this space; the event-sourced, tamper-evident audit layer is the part that maps to what Article 12 will eventually demand. The whole project is MIT-licensed at github.com/mcp-hangar/mcp-hangar. I’m not pitching it here — but it shapes what I notice, and you should know that.