Decomposing Complex Work Across AI Agents: What Actually Works

The instinct when building AI systems is to make each agent as capable as possible. One highly capable agent that can handle everything seems simpler than eight specialised agents that have to coordinate. Fewer moving parts. Fewer interfaces. Fewer failure points.

This instinct is wrong. Or more precisely: it is right up to a threshold of complexity, and above that threshold it produces systems that fail in ways that are very hard to diagnose.

I know this because I built it the wrong way first.

Article illustration — decomposing-complex-work-across-ai-agents

The early version of my AI system — the WAT system, which now runs 24 agents across seven functional teams — assigned complex multi-stage work to individual agents. A content brief that required research, then structuring, then writing, then quality checking was given to one agent. That agent did all four stages. Sometimes it did them well. When it did not, diagnosing where the process had broken down required re-reading the entire output, working backwards through the logic, and inferring where the agent had gone off course. It was opaque, slow, and not reliably fixable because the same agent would fail differently each time.


The decomposition principle

The shift in thinking was this: decomposition is not about capability. It is about accountability.

When one agent does everything, the accountability for every decision is mixed together in a single output. When a research agent produces a research brief and passes it to a writing agent, there is a clear interface between what the research agent is accountable for and what the writing agent is accountable for. If the final content is factually wrong, the failure is in the research brief or in the writing agent’s handling of it — those are two distinct, diagnosable questions. If one agent did everything, you cannot separate them.

The accountability principle led directly to the team structure in the WAT system. The 24 agents are organised into seven functional teams: Research, Development, Content, Finance, Revenue, QA, and HR. Within each team, agents have specific, non-overlapping roles. The content team has a strategist, a data collection specialist, a content writer, a linter, and an image sourcing specialist. Each does one thing. Each produces a defined output that is the input for the next stage.

This is not novel thinking. It is how every functioning production system — software, manufacturing, financial operations — has been organised for decades. Apply accountability structures to AI agents, and the principle is the same.


Interface-first delegation

The rule I now follow, without exception, before delegating any work to an agent team: define the interface before you define the agents.

Interface-first means: before I write any agent specification, I define what inputs the agent receives (format, source, required fields) and what outputs it must produce (format, destination, required fields). The agent specification is then written to transform specified inputs into specified outputs within a defined scope. The interface is the contract. The specification is the instruction.

This sequencing matters because it prevents a specific failure mode: agents that produce outputs the downstream agent cannot use. In a well-intentioned multi-agent system built without interface definitions, agents tend to produce outputs that are technically complete but formatted for human consumption rather than machine consumption. The downstream agent receives a rich, contextually detailed output and cannot extract the specific fields it needs to proceed. The pipeline stalls at an interface problem that was never formally specified.

In the WAT system, every agent interface is defined in the specification before the agent is written. The data collection agent produces a structured data object with specific fields. The content writing agent receives that object and maps from specified fields to specified sections. Neither agent makes decisions about format. Format is defined at the interface.


What the seven teams actually do

I will describe the team structure concretely, because the abstract principle benefits from a specific example.

Research Team. Agents that identify market opportunities, analyse competitive positions, and surface data signals. Their output is structured research briefs that other teams consume. They do not write content and they do not make editorial judgements.

Development Team. Agents that write, test, and deploy code. Technical architecture, database management, deployment verification. They do not collect data and they do not produce commercial strategy.

Content Team. Agents that produce editorial content — blog posts, service descriptions, product pages. The content team has clear internal sequencing: Cora (strategy and brief) to Danni (data collection) to Kelly (writing) to Zoe (quality audit). Each stage has defined inputs and outputs.

Finance Team. Agents that monitor costs, produce financial reports, and evaluate budget allocation. They have read access to spend data and produce reports. They do not have approval authority over spending.

Revenue Team. Agents that manage SEO, affiliate strategy, and commercial positioning. They operate within defined density limits and placement rules. They do not write editorial content.

QA Team. Agents that audit outputs across all other teams. They have no production access. They flag, report, and escalate. They do not fix.

HR Team. Agents that manage agent specifications, governance, and structural reviews. They do not have operational authority but they can flag boundary violations for human review.

The separation of audit (QA) from production (all other teams) is not accidental. A QA agent that also has production access is a potential conflict of interest in exactly the same way that a combined audit/operations function is in any organisation. The governance principle transfers.


The failure modes of poor decomposition

There are three decomposition failures I see consistently in AI systems.

Under-decomposition. One agent is given a scope that is too large for reliable execution. The agent improvises connections between sub-tasks that should have been explicit interfaces. Outputs are inconsistent because the implicit sequencing varies between runs.

Over-decomposition. The work is broken into so many small agents that the coordination overhead dominates. Each interface is a potential failure point. The orchestration logic becomes more complex than the agent work. This is uncommon but I have seen systems where 15 agents were doing work that 4 could handle with better interface design.

Boundary confusion. Two agents have overlapping scope definitions, both attempt the same work, and produce conflicting outputs. The orchestration layer receives two incompatible outputs and has no principled way to resolve them. This failure is entirely a specification problem — the boundaries were not drawn clearly enough at the decomposition stage.

The diagnostic question for each: was the decomposition designed around accountability (where clear outputs produce clear accountability), or around capability (what each agent can technically do)? Capability-based decomposition produces the most common failures. Accountability-based decomposition is harder to design but produces more reliable systems.


The human in the decomposition

One design decision I want to be explicit about: I am not removed from the system. I am in it.

The 24 agents operate autonomously on routine tasks within their defined scopes. When a task requires a decision outside any agent’s defined authority — spending above a budget threshold, architectural changes to the system itself, irreversible actions on production databases — it escalates to human review. That review is me.

This is not an efficiency compromise. It is a governance requirement. The approval gates for irreversible actions exist because the cost of an autonomous agent making an incorrect irreversible decision is higher than the cost of the 30 seconds required for human review. The decomposition design preserves human authority over the decisions where it matters.

Boards evaluating AI agent deployments should ask the same question about any system in their organisation: where are the irreversible actions, and who has authority over them? If the answer is “the agent decides autonomously,” that is either a governance gap or an evidence-based risk acceptance. Both deserve scrutiny.


The Board AI Governance Framework includes a decision structure for AI deployment approval that covers agent scope, interface definitions, and human oversight requirements. It is written for directors, not engineers.

For organisations designing or reviewing multi-agent AI architectures, contact Steven directly.

Steven Vaile

Steven Vaile

Board technology advisor and QSECDEF co-founder. Writes on AI governance, quantum security, and commercial strategy for boards and deep tech founders.