Counterfactual Reasoning in AI: What Boards Need to Understand When an AI System Fails

Imagine your company’s AI system produces a flawed output. A credit application is declined incorrectly. A hiring decision is made on the basis of a biased recommendation. A fraud detection model flags a legitimate transaction and locks out a customer. The incident reaches the board.

The first question in the room is almost always: “What went wrong?”

It is the wrong question.

Article illustration — counterfactual-reasoning-ai-boards

“What went wrong” produces a description of the failure — the output that was incorrect, the data point that was missing, the edge case the model had not seen before. It is a useful description. But it does not tell you whether the governance structure that was supposed to prevent this failure was in place. It does not tell you whether the same failure is about to happen in a different part of the system. And it does not tell you what would need to change to prevent it recurring. For those three questions, you need counterfactual reasoning. And most AI governance frameworks in use today do not teach boards how to do it.

What counterfactual reasoning actually is

I am going to simplify this significantly, and I will acknowledge the simplification upfront.

Counterfactual reasoning asks: what would have had to be different for this outcome not to have occurred? Not what description fits the failure, but what change — upstream, in the system, in the governance process, in the data — would have produced a different result. It is reasoning backwards through a causal chain, not reasoning forwards from symptoms.

The difference matters. Imagine a simple domestic scene: you reach into a bag of sugar and find it is empty. “What went wrong” is: there is no sugar. Counterfactual reasoning asks: what would have had to be different? If the shopping list had included sugar, you would have bought it. If the household had a rule to check stock before writing the list, the list would have included it. The failure is at the stock-checking stage, not the shopping stage. You fix the process, not the shopping trip.

That is a significant oversimplification of a formal methodology. But the principle — trace back through the causal chain to the decision that could have changed the outcome — is exactly what makes AI governance functional rather than decorative.

Why this matters specifically for AI

In traditional technology governance, failure modes are relatively deterministic. A system crashes because a specific component failed. The postmortem identifies the component. The fix is straightforward.

AI systems do not fail that way. They fail probabilistically, at the edges of their training distribution, when their inputs drift from what they were built on, or when they are applied to decision contexts their designers did not anticipate. The failure is rarely in a single component. It is usually in the relationship between the model, the data it received, the deployment context, and the oversight mechanism that was supposed to catch problems before they became consequential.

I spent a significant part of my career at companies that built causal analysis and root cause analysis tools — RiverSoft, SMARTS, Voyence, all in that space, all subsequently acquired by IBM or EMC. The product categories were different. The underlying problem was the same: complex systems fail in ways that are invisible to anyone looking only at the symptom layer. You have to trace back through the dependency chain to find the governance failure that enabled the surface failure to occur.

When I built a 24-agent AI system that now runs in commercial production, every governance structure I put in place was built around this principle. Not “what output did this produce” but “what decision, what threshold, what oversight mechanism, what data quality gate could have prevented this output from ever reaching a consequential stage.”

Boards are not expected to understand the model architecture. But boards are expected to have approved a governance structure that applies this kind of reasoning at the right points in the deployment cycle.

The four counterfactual questions a board should require

These are not technical questions. They are governance questions. A board does not need a PhD in machine learning to ask them. It needs the discipline to require answers.

Question 1: If this AI system produces an incorrect output, is there a human review stage before that output becomes consequential?

This is the human-in-the-loop question, but stated correctly. Not “do we have human oversight” — which most governance presentations will answer with a yes — but specifically: at what point in the process does a human have the ability to intervene before the AI output becomes irreversible? If the answer is “after the decision is logged,” that is not oversight. That is a postmortem function.

Question 2: What data would have had to be different for this system to produce a different output?

This question is aimed at data governance, not model architecture. When an AI system fails, the board should ask the executive team to trace back to the data conditions that produced the failure. If the team cannot answer this question, the organisation does not have adequate data lineage for board-level accountability.

Question 3: Was the governance structure that was supposed to prevent this failure actually in place before the incident?

This is the most important question and the one least likely to be asked in a board postmortem. The instinct is to discuss the failure and then discuss the fix. Counterfactual governance asks: not what fix do we need now, but was the governance that was supposed to prevent this already approved and implemented? If yes, why did it not function? If no, why was the deployment approved without it?

Question 4: What other AI deployments in the organisation share the same governance gap?

A single failure is an incident. A pattern of failures from the same governance gap is a systemic issue. This question extends the counterfactual from the specific failure to the governance architecture as a whole. It is the question that distinguishes a board that manages AI risk from one that manages AI incidents.

The governance failure this is designed to catch

The most common AI governance failure I see at board level is not negligence. It is proximity. The board receives a compliance presentation, the presentation describes the governance framework, and the board approves it. The failure is that the framework describes what is in place — oversight committees, documentation processes, testing protocols — without demonstrating that any of those mechanisms would actually catch a governance failure before it became an incident.

A governance framework that cannot answer the four counterfactual questions above is a description of structure, not a demonstration of function. The structure and the function are not the same thing.

This is where my background in causal analysis is relevant in a way that a generalist AI governance framework is not. Most frameworks are taxonomic: here are the categories of risk, here are the controls, here are the documentation requirements. They are useful. They are also insufficient for boards that want to govern AI rather than simply describe their governance of it.

The counterfactual approach asks: if this governance structure had been in place and functioning when the failure occurred, would the failure have been prevented, caught before it became consequential, or would it still have reached the board as an incident? If the honest answer is “it would still have happened,” the governance structure needs redesigning, not documenting.

What this looks like in practice

In a board meeting where the CTO is presenting an AI deployment proposal, the counterfactual governance question is: “Walk me through the scenario where this system produces a significantly wrong output. At what point does a human review that output before it has consequences? Who is that human, what are they reviewing against, and what is the escalation path if they flag a problem?”

If the CTO cannot answer that question with specificity — specific human, specific review criteria, specific escalation path — the deployment proposal is not ready for board approval. Not because the technology is wrong, but because the governance architecture is incomplete.

This is not a high bar. It is a basic governance requirement. The fact that it is so rarely asked in board meetings is not because boards are incompetent. It is because the governance frameworks in most boardrooms do not prompt this question. They describe what oversight looks like in principle. They do not test whether the oversight mechanism is actually functional.

The Board AI Governance Framework gives boards a decision-making structure for evaluating AI deployments — including the oversight and escalation questions that standard compliance frameworks typically skip. It is a governance tool, not a compliance checklist, designed for boards that want to govern AI rather than simply document that they have.

For boards navigating a live AI deployment or an AI incident review, contact Steven directly to discuss independent advisory support.

Steven Vaile

Board technology advisor and QSECDEF co-founder. Writes on AI governance, quantum security, and commercial strategy for boards and deep tech founders.

What counterfactual reasoning actually is

Why this matters specifically for AI

The four counterfactual questions a board should require

The governance failure this is designed to catch

What this looks like in practice

Related resource