← Back to The Lab
§ PatternMay 26, 202611 min

The trust boundary pattern — the architectural decision you are already making, whether you know it or not

Every production agent system has a trust boundary. Most teams never defined theirs explicitly. Here is the framework for deciding where it sits, the four most common placements, and the failure modes that emerge when the boundary drifts.

ShareXLinkedInFacebook

> ../patterns/trust_boundary.md

§ 01 · The definition

A trust boundary is the line between what your agent is permitted to decide autonomously and what requires a human decision, an external approval, or an explicit override.

Every production agent has one. Most teams never wrote it down. Instead, they have a collection of implicit decisions — tools that weren't added, permissions that weren't granted, escalation paths that were wired in "to be safe" — that together constitute an accidental trust boundary.

Accidental trust boundaries cause two failure modes. They are either too permissive (the agent does things the team didn't realize it could do) or too restrictive (the agent constantly escalates decisions that were safe to make autonomously, eroding the value of having an agent at all).

The pattern here is how to make the decision explicit.

—— Four boundary placements ——

§ 02 · Four boundary placements

Placement I — Model boundary. The trust boundary is at the model layer. The agent can reason about anything, but takes no action. Every output is a recommendation; every action is performed by a human after review. Use case: high-stakes domains where the cost of an incorrect autonomous action exceeds the cost of human review latency. Legal drafting, medical records, financial transactions above a threshold.

Placement II — Read boundary. The agent can read any data it has access to, but cannot write, modify, or trigger external actions. It can retrieve, summarize, analyze, and recommend. It cannot execute. Use case: research and analysis agents in regulated environments. Intelligence is inside the boundary; action is outside it.

Placement III — Scoped write boundary. The agent can read and write, but only within a defined domain. It can modify records in a specific table, send messages in a specific channel, update files in a specific directory. Writes outside the scope require explicit approval. Use case: the majority of production operational agents. This is the correct default placement for most teams.

Placement IV — Supervised autonomy. The agent can read, write, and take actions across a broad scope — but with circuit breakers, rate limits, kill switches, and audit trails that allow intervention. The boundary is not about what the agent can do; it is about how quickly a human can undo it. Use case: high-trust, high-volume workflows where the cost of manual approval per action exceeds the cost of occasional cleanup.

—— The four questions ——

§ 03 · The four questions that define the boundary

For any agent in production, answer these four questions before deploying:

What is the worst irreversible action this agent could take? If the answer is "delete production records" or "send a customer communication I can't unsend," that action must sit outside the boundary or require an explicit approval gate.

What is the blast radius of a misfire? If the agent writes to one record per run, the blast radius is one. If it writes to a table, the blast radius is the table. Size your recovery cost before you set your boundary.

What does escalation look like? Every agent needs a clear, tested escalation path. "The agent flags uncertain decisions" is not an escalation path. "The agent posts to #agent-review-queue with a 2-hour resolution SLA" is an escalation path.

Can the boundary drift? Scope creep in agent permissions is real. The agent that started with read access to one database now has write access, now has access to three databases. Define the boundary in a version-controlled document — not just in your tool configuration — and review it quarterly.

—— Boundary drift ——

§ 04 · Boundary drift — the failure mode

The most common trust boundary failure is not a single wrong decision. It is gradual drift — a sequence of small permission expansions, each of which seemed reasonable at the time, that collectively move the agent into a scope no one intended to authorize.

The pattern: initial deployment is conservative. The agent proves reliable. A new use case emerges that requires slightly broader access. The permission is added. Six months later, the agent has access to systems that weren't in the original design — and no one has a complete map of what it can now do.

The fix is not technical. It is procedural. Treat agent permission changes with the same review process as production code changes. Require a written justification for every boundary expansion. Review the complete permission set quarterly, not just the delta.

—— The file that holds the boundary ——

§ 05 · The file that holds the boundary

Create a TRUST.md in every agent repository. It is a single document, version-controlled, that contains: the four boundary placement answers, the complete tool list with read/write scope for each, the escalation path with SLA, the last boundary review date, and the name of the human who approved the current configuration.

This is not bureaucracy. This is the document your team will need the day something goes wrong — and the day an auditor asks what your agent was permitted to do.

—— End of pattern ——

Every agent has a trust boundary. The question is whether you defined it or inherited it.

Scoped write (Placement III) is the correct default for most production agents.

If you do not have a TRUST.md, your boundary is drifting. You just haven't noticed yet.

— ORBIRESEARCH

ShareXLinkedInFacebook