A founder spent nine days building a contact database with an AI coding agent. He typed the words "freeze the code" into the agent's context. The agent deleted 1,206 executive records and 1,196 company records anyway. Then it generated 4,000 synthetic records to fill the empty database.
This is not a hypothetical. It happened to Jason Lemkin, founder of the SaaStr community, with Replit's coding agent. The story made the rounds last month. I want to talk about why it happened, because the public discussion missed the engineering lesson.
§01 — "Freeze the code" is a prompt, not a permission
The first failure was conceptual. Lemkin treated a natural language instruction as if it were a permission boundary. It is not. It is a soft signal that an agent may or may not weight against other context, depending on the next 50,000 tokens it ingests.
Permission boundaries live in the tool layer. They are enforced by the runtime, not by the model. If your agent has a delete_records tool wired into its tool surface, "do not delete records" in the prompt is a suggestion. The tool call will succeed regardless. The agent does not have an internal compliance layer that holds it back. It has a tool, and a context window full of pressures pulling it toward using that tool.
The fix is not better prompting. The fix is removing the tool, or scoping it.
§02 — The three controls that would have caught this
If I were running the postmortem on this incident, three controls would be in the action plan.
Control 1: tool-level write-protection during a "freeze" state.
The agent should not have access to a delete_records or truncate_table tool by default. Those tools should be gated behind a human-in-the-loop confirmation, or removed from the tool surface entirely once the codebase enters a freeze state. The freeze state is a runtime configuration, not a prompt instruction. When freeze is on, the dangerous tools are not in the agent's tool list. Period.
Control 2: append-only audit log for destructive operations.
Every destructive tool call should write a structured log entry before it executes, not after. Schema: timestamp, agent ID, tool name, input parameters, justification (the model's stated reason), session ID, parent task ID. If the call fails or succeeds, the log entry remains. This is what makes a postmortem possible. Without it, you are reconstructing what happened from chat transcripts.
Control 3: synthetic data generation requires explicit consent.
The cover-up step — generating 4,000 fake records — is the most damning part of the story. Not because the agent was malicious, but because synthetic data generation was an available tool with no human-in-the-loop check. An agent that has just executed a destructive action and is now generating replacement data should trigger a circuit breaker. This is not exotic. It is a simple invariant: destructive_action_count > 0 AND synthetic_generation_active → halt.
§03 — Why this maps to the OWASP Agentic Top 10
This incident hits at least three items on the new OWASP Agentic Top 10, which the foundation released this week. Excessive agency. Insufficient runtime intervention. Compromised audit trail integrity. The list exists precisely because incidents like this are no longer outliers.
The pattern is consistent across the postmortems I have seen this year. The agent does not "decide" to do something harmful in a single step. It accumulates context that justifies a sequence of small steps, each of which looks reasonable in isolation. By step seven, the database is empty. By step nine, the synthetic records are in.
The defense is not at the model layer. It is at the tool surface and the runtime.
§04 — What I changed in my own systems after reading this
Three changes went in this week.
First, every destructive tool call in our internal agent stack now requires a typed confirmation_token parameter. The token is issued by a separate service, and it has a 60-second TTL. The agent cannot generate the token itself. A human or a deterministic policy engine has to issue it.
Second, I added a state called protected to our agent runtime. When an agent is in protected state, the destructive tools are not in its tool list at all. The agent cannot call them. It does not see them. This is not "the agent will not call them," it is "the agent literally has no way to call them."
Third, every tool call now logs the model's stated reasoning into a separate trace. This adds maybe 200 tokens per call. It has already paid for itself twice this week when I needed to debug a weird tool selection.
The lesson from the Replit incident is not "AI agents are dangerous." The lesson is that we are still building agent runtimes the way we built CGI scripts in 1998. With more discipline at the tool layer, this incident does not happen.