Why AI Agents Fail: The Specification Problem Nobody Talks About

There is a standard explanation for why AI agents fail, and it is usually wrong.

The standard explanation is that the model hallucinated, or the training data was poor, or the context window was insufficient. These things happen. But in a system of 24 agents running in production — which is what I have been building and operating for the past two years — the most common failure source is none of them.

The most common failure source is the specification. The agent was not told precisely enough what it is responsible for, what it must not do, where its authority ends, and what it should do when it encounters a situation that does not fit its defined scope.

Article illustration — why-ai-agents-fail-specification-problem

I write this as someone with a background in root cause analysis software. RiverSoft, SMARTS, Voyence — all in the fault isolation and network root cause analysis space, all subsequently acquired by IBM or EMC. The lesson from that entire category of software is that the presenting failure is rarely the root failure. The presenting failure in an AI system is usually a bad output. The root failure, in a significant proportion of cases, is a specification that left a gap the agent filled with its own inference.

What specification precision actually means

I am going to be specific rather than theoretical here, because this is a practical problem that benefits from concrete illustration.

When I write an agent specification for the WAT system — my operational AI system, built on Claude, running 24 specialist agents — the specification defines a set of things that most AI system builders treat as optional. They are not optional. They are the difference between an agent that behaves predictably and one that improvises in ways you did not anticipate.

Scope. What exactly does this agent do? Not the category — the specific tasks. Not “this agent handles content quality” but “this agent checks for banned word lists, em dash usage, and American spelling variants in editorial content. It does not rewrite content. It does not make editorial judgements. It flags and scores.”

Boundaries. What does this agent explicitly not do? This seems redundant. It is not. An agent without explicit scope boundaries will fill gaps with reasonable inference. The inference will sometimes be correct. When it is not, you have an agent that has taken an action outside its authority, and the output is unpredictable.

Interface. What inputs does this agent receive and in what format? What outputs must it produce and in what format? The interface definition is the contract. An agent that receives ambiguous inputs will interpret them. If your downstream systems expect a specific output format and the agent decides to provide a more helpful one, you have an integration failure.

Escalation. What does this agent do when it encounters a situation outside its defined scope? Does it stop and ask? Does it flag and continue? Does it make a best-effort attempt and log the uncertainty? This is rarely specified and almost always matters.

Authority limits. What actions can this agent take autonomously, and which require explicit approval? In my system, agents with write access to production databases operate under approval gates for irreversible actions. The gate is specified, not assumed.

The failure mode this produces in practice

The failure pattern I see most often is what I call inference-at-the-boundary. The agent is well-specified for its core task. The boundary cases are underspecified. When the agent encounters a boundary case, it does not stop. It infers what a reasonable interpretation of its mandate would suggest and acts on that inference.

This is not a model flaw. It is a specification flaw. A well-specified agent knows that it does not know — because the specification tells it, explicitly, what falls inside and outside its scope, and what to do when it is uncertain.

The agents in my system have what I call anti-exemplars: documented examples of failure patterns specific to each agent’s role. A content writer agent, for instance, has anti-exemplars showing what it looks like when it starts rewriting source material it was supposed to copy verbatim. A data collection agent has anti-exemplars showing what it looks like when it starts synthesising data it was supposed to retrieve. These are known failure modes, documented precisely because they occurred and were diagnosed. The specification tells the agent: when you see these conditions, recognise them as the known failure mode and stop.

This is specification precision applied to failure cases. It is the most important part of the specification and the most commonly omitted.

Why this problem does not get talked about

The AI industry has a strong incentive to attribute failures to model capability. Capability failures are solved by better models, larger context windows, improved training. Those are products. The solution has a price.

Specification failures are solved by better thinking about what you want the agent to do. That is not a product. It is a craft, and it requires the person deploying the agent to think rigorously about requirements before deployment — which is considerably harder than assuming the model will figure it out.

There is a principle I have been applying in the WAT system since the first version: probabilistic decisions belong to agents, deterministic decisions belong to tools. What that means in practice is that anything requiring judgment — interpretation, synthesis, prioritisation, quality assessment — is agent work. Anything requiring precise, repeatable execution — data retrieval, format checking, export generation — is a tool call. The agent does not attempt deterministic work through inference. It calls the tool.

The specification has to encode this distinction. When it does not, agents attempt deterministic work through inference, produce variable results, and the failures look like model failures. They are not.

The input/output contract

The single most useful discipline I have applied to agent specification is treating every agent as having a formal interface: defined inputs and defined outputs.

Before I deploy any agent in the WAT system, I define: what does this agent receive, in what format, from what source? What must it produce, in what format, to what downstream system? The inputs and outputs are defined before the agent is written. The agent specification then tells the agent how to transform the inputs into the outputs within its defined scope.

This is not novel. It is basic software engineering practice applied to agents. The reason it matters more for agents than for traditional software is that agents, unlike traditional software, will improvise when the interface is ambiguous. A function with unclear inputs will fail with a type error. An agent with unclear inputs will make an assumption and proceed. The assumption may be right. You have no way of knowing which it is without testing every boundary case — which is only possible if you have defined the boundary cases in the specification.

The input/output contract is the most reliable guard against improvisation at the specification boundary.

What this means if you are deploying agents

I write for practitioners and for board-level decision-makers who are approving AI deployments without hands-on involvement in the specification process. Both audiences have a relevant takeaway.

For practitioners: the time you spend on specification before deployment is recovered many times over in reduced debugging and output unpredictability. The specification is not documentation for humans. It is instruction for the agent. Treat it with the rigour you would treat any other system specification.

For boards: when you are approving an AI agent deployment, the technical quality of the model is less relevant to governance outcomes than the quality of the specification. Ask to see it. Ask specifically: what does this agent do? What does it explicitly not do? What approval is required before it takes irreversible actions? What does it do when it encounters a situation outside its scope? If those questions cannot be answered clearly, the deployment is not ready for approval.

The model will do what it is capable of doing. What it chooses to do, and when, is determined by the specification. That is where AI governance earns its keep.

The Board AI Governance Framework includes a section on agent deployment approval criteria — the specific questions boards should require answers to before approving agentic AI deployments. It is built from operational experience with a production multi-agent system, not from theoretical governance principles.

For boards or organisations building or overseeing agentic AI systems, contact Steven directly.

Steven Vaile

Board technology advisor and QSECDEF co-founder. Writes on AI governance, quantum security, and commercial strategy for boards and deep tech founders.

What specification precision actually means

The failure mode this produces in practice

Why this problem does not get talked about

The input/output contract

What this means if you are deploying agents

Related resource