Three things landed this week. All three confirm the same thesis: the gap between agents that demo well and agents that run in production is an engineering problem, not a prompting problem.
1. G2 surveyed 770 reviews and 7 vendors, orchestration is the ceiling
G2 published their 2026 State of AI Agent Builders report based on 770 verified reviews and direct input from seven vendors. The finding that matters most: vendors ranked their orchestration layer at an average of 3.4 out of 5 possible roles. And when asked what causes agent workflow failures, 6 out of 7 vendors identified API and system integration failures as the primary cause.
Not hallucinations. Not bad reasoning. Integration failures.
This maps directly to what we see in production. The hard part is never getting a single agent to work. The hard part is getting Agent A to hand off cleanly to Agent B, with validated outputs, shared state, and conflict resolution that doesn't cascade into failure.
One vendor put it precisely: "Managing how specialized agents communicate, share state, and resolve conflicts without creating circular logic loops or cascading failures becomes exponentially complex." They call it the "Orchestration Ceiling." We call it Miro Frame 03 and a Sequence Diagram.
The practical takeaway: if you're building a multi-agent system and you don't have formal handoff contracts between agents, you will hit this ceiling. It's not a matter of if. Our Tool Contract Library exists specifically for this, every input, output, failure mode, and retry behavior documented before any agent touches production.
2. NASA researchers published CARE, a methodology that formalizes agent engineering
A team from NASA published a paper called CARE: Collaborative Agent Reasoning Engineering. It introduces a disciplined, artifact-driven methodology for engineering LLM agents, formalizing behavior, grounding, tool orchestration, and verification through reusable artifacts and stage-gated phases.
Read that again: artifact-driven, stage-gated phases.
That is almost word-for-word how our four-layer architecture works. Miro defines the behavior and grounding. Notion documents the artifacts and contracts. Mermaid formalizes the verification. Hermes implements after all gates pass.
The difference: CARE is a research paper. Our methodology has been running in production for months with real clients.
But the signal matters enormously. When NASA-affiliated researchers independently arrive at the same structural conclusion, that ad-hoc agent building produces unreliable systems and you need formal engineering stages, it validates the entire category we're building in. Agent engineering is becoming a real discipline, not a side effect of prompt engineering.
3. MCP crossed 10,000 servers, and tool security just became urgent
The Model Context Protocol now has over 10,000 public servers deployed globally. MCP was donated to the Agentic AI Foundation, cementing it as open infrastructure rather than a single company's project.
But open access creates real attack surface. Salesforce flagged a specific threat: tool poisoning attacks, where malicious MCP servers manipulate agent behavior through injected instructions. When your agent can connect to thousands of external servers, every connection is a potential vulnerability.
This is why every tool in our system goes through a Tool Contract before it reaches production. The contract doesn't just define inputs and outputs, it defines what the tool must never do, what permissions it requires, and what happens when it returns unexpected data. Without that contract, connecting to an MCP server is like giving a stranger the keys to your system and hoping they're trustworthy.
If you're using MCP in production: audit every server connection, define explicit permission boundaries, and log every tool call. The convenience of plug-and-play tool access is real. So is the risk.
This week's signal summary
· Orchestration, not prompting, is the real scaling bottleneck. Document your agent handoffs before you build them.
· Formal agent engineering methodologies are emerging from research institutions. The ad-hoc era is ending.
· MCP is powerful infrastructure, but every connection needs a security contract. Trust nothing by default.
See you next Friday.
— ORBIRESEARCH