OpenAI ships sandboxing as a primitive, 90% of agents fail security audits, and the OWASP Agentic Top 10 lands, The Lab

Three signals worth your attention this week: model-native agent harnesses are arriving, an academic consortium quantifies the security gap, and a new standard for agent threats is now public.

This week marked a turning point in how the industry talks about agents. Less about what they can do, more about what they should not be allowed to do.

Three signals worth your attention.

§01 — OpenAI moves the agent loop into the model

OpenAI shipped an Agents SDK update with two changes that matter for production teams. Native sandbox execution is now a first-class primitive, and the agent control loop has moved closer to the model itself.

The practical effect for those of us shipping agents: fewer malformed tool calls, better failure recovery, and the elimination of an entire class of "stuck in a loop" bugs that every framework user has debugged at 2 AM.

This narrows the gap between OpenAI Agents SDK and Claude's Agent SDK on core features. The differentiation is shifting from "what the harness does" to "what the model does inside the harness." For teams choosing between them, the decision now hinges on tool ecosystem, pricing, and how each handles long-horizon state, not on which has better retry logic.

What this means for the four-layer architecture: Layer IV (Hermes / implementation) shrinks. Less retry boilerplate, less custom loop scaffolding. More time to spend on Layer I and II, where the actual leverage lives.

§02 — The Elloe AI Lab paper quantifies the security gap

A consortium including Stanford, MIT, Carnegie Mellon, IT University of Copenhagen, and Nvidia analyzed 847 agent deployments across healthcare, finance, customer service, and software. The numbers are sobering.

Nine in ten autonomous agents in production are vulnerable to a class of attack that standard safety testing cannot detect. Among agents that retain memory across sessions, 94% proved vulnerable to memory-poisoning attacks. Multi-agent systems showed a 78% vulnerability rate to delegation failures, where a compromised subagent propagates malicious instructions across the network.

Nearly 90% of all tested agents showed measurable goal drift after roughly 30 steps of operation. This is the failure mode that single-turn evaluations cannot catch. It is also the failure mode most teams are not testing for.

The takeaway is not "build more secure agents." The takeaway is: monitoring alone is insufficient if the system cannot intervene at runtime. Detection without intervention is theater.

§03 — OWASP Agentic Top 10 is published

The OWASP Foundation released a Top 10 list specifically for agentic applications, cataloguing how agents have already been exploited in production. This is the same OWASP that publishes the Top 10 for web application security, the one your security team already trusts.

Why this matters now: until this week, "agent security" was a fragmented conversation. Different vendors, different frameworks, different threat models. The Top 10 gives the industry a shared vocabulary. Expect procurement checklists to start referencing it within weeks.

Practical move for your stack this week: pull the list, map each item to your current agent architecture, identify the three you have no answer for. That is your next sprint.

§04 — What I'm watching next

The 1H 2026 State of AI and API Security Report dropped numbers that should change how you think about agent infrastructure. 47% of organizations have delayed a production release because of API security concerns tied to autonomous systems. Nearly half are blind to machine-to-machine traffic from their own agents.

This is the bottleneck nobody puts on a slide. The agents work. The infrastructure they run on is not ready for them.

Next week's signal will dig into MCP server security specifically, since that is where this gap is widest.

See you next Friday.

— ORBIRESEARCH