Building in public: AI systems for governance thinking.

This is a record of what I am building, testing, and learning about AI systems: both technical and business standpoints. It is not a product page. Not a consulting pitch. A working log of someone who thinks the best way to advise on AI governance is to have made the decisions yourself, in production, with real consequences.

I run a 24-agent AI orchestration system commercially. It processes real data, makes routing decisions, runs quality and training loops, and handles complex workflows across multiple concurrent projects. It is a system I have been building for AI operations, licenced to my own companies.

This page covers what I learn from operating it.

Why build in public?

There is a specific irony in being a board advisor on AI governance who has never actually built an AI system. Most AI governance advice comes from one of two places: regulatory frameworks (which tell you what to do) or consulting playbooks (which tell you what others have done). Neither tells you what it actually feels like to make the architecture decisions, discover the failure modes, and fix them under operational pressure.

I am not a software engineer. I have not written the underlying code for most of what the system does. What I have done is make every governance decision the board would face in a real deployment: what the system is authorised to do autonomously, where humans stay in the loop, how failures are logged and reviewed, how cost is managed at scale, how quality is enforced across agents with different roles and different risk profiles.

Those are board governance questions. And I have had to answer them for a live system, not a hypothetical one.

Current work

The WAT system. WAT stands for Workflows, Agents, Tools. The architecture separates three things that most AI deployments conflate: the process logic (Workflows, written as Markdown SOPs), the probabilistic reasoning layer (Agents, Claude instances with specific roles, memories, and authority models), and the deterministic execution layer (Tools, scripts that do exactly what they are told).

The reason this separation matters for governance is that it makes accountability legible. A board can audit a workflow. It can review an agent's authority model. It cannot audit a large language model's internal reasoning. The WAT architecture is designed to keep the auditable parts auditable.

Quality and training loops. The system includes a scoring mechanism: agents rate each other's outputs on accuracy, completeness, and usability. High-scoring outputs are promoted to exemplar libraries. Failure patterns are logged and used to brief agents at the start of subsequent runs. This is a governance mechanism, not just a quality control one. It creates a record of what the system was told, what it produced, and whether the output met the required standard.

Cost routing. Running 24 agents across multiple concurrent projects involves non-trivial token costs. The system routes tasks to models based on complexity and required capability. Smaller, cheaper models handle deterministic tasks; more capable models handle tasks that require genuine reasoning. This is an investment governance question: what is the right level of AI capability for this specific decision? Boards will face versions of this question as AI deployment scales inside their own organisations.

What this means for governance advice

The most useful thing I can tell a board about AI governance is not what the EU AI Act requires or what the NIST framework recommends. It is this: the governance problems that actually matter are not the ones your framework covers. They are the ones that emerge when the system is operating under real conditions, with real data, and real operational pressure.

I know where the gaps are because I have had to close them myself. That is a different kind of advisory credential from a certification or a framework.

Explore AI Governance insights