← Back to The Lab
§ PatternMay 12, 202611 min

Context engineering at the right altitude, the discipline that replaces prompt engineering in production

Prompt engineering optimizes a single question. Context engineering decides what the agent can see, when, and at what resolution. Here is the framework for choosing altitude, the four most common failure modes, and the file structure that holds up under audit.

ShareXLinkedInFacebook

> ../patterns/context_altitude.md

The phrase "prompt engineering" is doing damage. It tells junior engineers that the model is the bottleneck and that better wording is the lever. In production, it is almost never the lever. The 2026 State of AI Agents report puts 46% of practitioners citing system integration, not model capability, as the primary blocker. The AI Agent Conference 2026 stated it plainly during the Agentic Engineering track: agent performance depends less on how you ask a question and more on the information architecture surrounding the agent.

Context engineering is the discipline of deciding three things, in order:

what data sources the agent can see

how that data is structured before it enters the model

what gets retrieved when

The mistake most teams make is to treat the context window as infinite scratch space. It is not. Every token has two costs, the API charge and the attention cost. Irrelevant tokens reduce the model's ability to focus on the ones that matter. We have measured the same agent, on the same task, drop from 84% task completion to 41% by adding 8,000 tokens of "helpful background." The background was technically correct. It was the wrong altitude.

—— The altitude framework ——

Think of context like a map. A road atlas of Europe is useless for navigating one neighborhood in Belgrade. A street-level scan of one block is useless for planning a route from Frankfurt. The same content, at the wrong scale, becomes noise.

We use four altitudes when designing an agent's context plan.

Layer I — Strategic. Mission, scope, refusal boundaries. What this agent is, what it never does. Loaded once per session. Roughly 300 to 600 tokens. Stable across runs.

Layer II — Operational. Tool contracts, error taxonomy, idempotency conventions, retry rules. Loaded once per session. Roughly 800 to 1,500 tokens. Changes only when tools change.

Layer III — Tactical. The specific task, the user input, the immediate sub-goal. Refreshed every turn. Variable size, ideally under 2,000 tokens.

Layer IV — Reference. Retrieved evidence. Documents, schemas, prior decisions, prior tool outputs. Loaded just-in-time, only the relevant slice. Hard cap before the call, not after.

The discipline is to know which altitude any given piece of information belongs to. A list of API endpoints is Layer II. A specific endpoint response from two turns ago is Layer III if still relevant, Layer IV if archived. A company style guide is Layer I if the agent writes anything, Layer IV if it only writes occasionally and on request.

—— The four failure modes ——

Altitude collision. Strategic content loaded as Tactical, refreshed every turn. The system prompt grows by 200 tokens per message because someone is concatenating, not layering. The agent forgets the most recent user instruction because the strategic preamble pushed it out of working attention.

Premature retrieval. Layer IV content loaded before the agent has decided whether it needs it. We have seen pipelines that pull 12 documents from a vector store on every turn, regardless of question. Three of those documents are usually relevant. The other nine are taxing attention and budget.

Stale tactical context. Layer III content from three turns ago, still in the window because the implementation appends instead of windows. The agent confidently answers the wrong question, the one it was asked four messages back.

Schema drift in Layer II. Tool descriptions change in the SDK, the agent's loaded contract does not. The model emits valid-looking calls that the new tool layer rejects. This was the n8n v2.6.3 failure pattern earlier this year, schema drift between an upgraded tool generator and downstream LLM APIs broke every production workflow until the version was rolled back.

—— The file structure ——

We give every production agent a directory, not a prompt. The structure is non-negotiable.

agent/
  context/
    layer_01_strategic.md      # mission, scope, refusal boundaries
    layer_02_operational.md    # tool contracts, error taxonomy, retry rules
    layer_03_tactical.md       # task template, user input schema
    layer_04_reference.md      # retrieval triggers, evidence filters
  contracts/
    tool_*.md                    # one per tool: schema, failure modes, idempotency
  tests/
    boundary_*.md                # permission violation scenarios
    altitude_*.md                # context budget tests per layer
  context_plan.md              # the file most teams skip

CONTEXT_PLAN.md is the file most teams skip. It is the one that prevents the four failure modes above. It states, in plain language, which layer each information source belongs to, what triggers retrieval, and what the token budget is per layer. If the budget is exceeded, the plan dictates which layer gets compressed first. Layer III before Layer IV. Layer IV before Layer II. Layer II before Layer I, never.

—— What to measure ——

If you cannot answer these four questions about your agent, you do not have a context plan, you have a prompt.

What is the token budget per layer, and how often is each layer exceeded in production?

Which Layer IV retrievals had zero impact on the final tool call? Those are the ones to remove.

When the agent fails, can you reconstruct exactly what it could see at the moment of the failed decision?

When a tool contract changes, what is the propagation path to the agent's Layer II context, and how long does that take?

If those four traces exist, you are doing context engineering. If they do not, you are doing prompt engineering with extra steps.

—— End of pattern ——

Layer I → identity. Layer II → tools. Layer III → task. Layer IV → evidence.

Mix them, and the agent drifts. Layer them, and it ships.

— ORBIRESEARCH

ShareXLinkedInFacebook