What Causal Analysis Is and Why Boards Should Care About It More Than They Do

Let me describe what causal analysis is not before I describe what it is.

It is not root cause analysis as most people understand that term — the postmortem exercise where you work backwards from an incident, identify “the root cause,” and write a recommendation to prevent recurrence. That process is superficially similar to causal analysis but produces different results, because it typically stops at the first plausible cause rather than the structural failure that enabled the plausible cause to occur.

It is not statistical correlation analysis, which identifies relationships between variables without establishing direction or mechanism.

Article illustration — what-causal-analysis-is-boards

It is not blame attribution, though it often identifies who was accountable for the decision that enabled the failure.

What causal analysis is: a formal methodology for identifying the directed causal relationships in a complex system — the specific decision, condition, or structural gap that is the sufficient and necessary antecedent of the observed outcome. Not “what could have caused this” but “what did cause this, and what would have had to be different for this outcome not to have occurred.”

The counterfactual is central. A causal analysis is complete when you can answer: if X had been different, would the outcome have been different? If yes, X is causal. If no, X is correlated but not causal.


Where I learned this and why I still use it

I spent a significant part of my career at companies that built causal analysis and root cause analysis software. RiverSoft made network fault isolation tools — software that could trace a service outage back through thousands of correlated fault alerts to the specific physical or logical failure that caused it. SMARTS made the same category of product for telecommunications networks. Voyence applied it to military and enterprise network configuration management.

The technical implementation of causal analysis in those products was sophisticated — Bayesian networks, dependency modelling, real-time causation tracing at scale. But the underlying principle was simple enough to explain to a CFO in a room: the failure you can see is almost never the failure that caused the problem. The failure that caused the problem is upstream, in the dependency chain, and it is structural.

I have applied this methodology to governance problems ever since. Not because governance failures are technically similar to network faults, but because complex systems fail in the same ways regardless of domain. The governance structure that prevents catastrophic failure in a telecommunications network operates on the same principles as the governance structure that prevents catastrophic failure in an AI deployment.

When I advise boards on AI governance, causal analysis is the analytical framework I bring to every incident and every governance review. Not because other frameworks are wrong, but because causal analysis identifies the structural governance failure rather than the proximate cause — and it is the structural failure that, if unaddressed, produces the next incident.


Why causal analysis matters specifically for AI governance

AI systems fail in ways that are particularly amenable to causal analysis, for a specific reason: AI failures are almost always failures in the relationship between the system and its operational context, not failures in the system’s internal logic.

An AI system that fails in production is not usually broken in the way a traditional software system breaks. Its code is correct. Its models are functioning as designed. The failure is that the system was deployed in a context that differed from its validated operating range in ways that were not anticipated by the governance structure.

The governance question is: what causal chain led to the system being deployed in a context its governance structure was not designed to handle?

That question has a structural answer, and the answer is almost never “the AI system failed.” The answer is typically one of:

  • The deployment scope approval process did not include a review of operational context against validated operating range
  • The human oversight mechanism was designed for the expected failure mode, not the actual failure mode
  • The data quality monitoring did not include the specific data shift that changed the system’s operating context
  • The board approved the deployment without being told that the validated operating range had constraints that the planned deployment exceeded

Each of these is a governance structure failure, not a technology failure. Causal analysis identifies which one it is, which determines which part of the governance structure needs to change.


The five whys and its limitations

The “five whys” methodology — ask “why” five times, and the fifth answer is typically the root cause — is a widely used approximation of causal analysis. It is useful for simple systems with linear failure chains. For AI governance, it is usually insufficient.

The reason: AI governance failures are typically not linear. They are structural — the failure occurs because multiple conditions were simultaneously present that individually would not have caused the failure. A model deployed outside its validated range, reviewed by a human who was not trained on the model’s failure modes, producing outputs in a context where no escalation mechanism existed — any one of these conditions might not have produced a failure. All three together did.

The five whys methodology, applied to this failure, will identify one of the three conditions as “the root cause” and recommend fixing it. The causal analysis identifies all three conditions as jointly causally sufficient, and the governance structure as the containing failure that allowed all three to exist simultaneously.

The board governance implication: when an AI incident reaches the board, the question “what was the root cause” typically produces a single answer that justifies a single remediation. The question “what causal conditions were jointly sufficient for this outcome, and which of those conditions should a functioning governance structure have prevented?” produces a governance review.

The second question is harder to answer and more useful to ask.


What boards can do with this

Boards do not need to become practitioners of formal causal analysis. The methodology requires specialist knowledge to apply rigorously. What boards can do is apply the counterfactual principle.

When an AI incident is presented to the board — or, more proactively, when an AI deployment is proposed — the counterfactual question is: if our governance structure was fully functional, would this have been caught before it became an incident?

If yes, the governance structure needs to be tested rather than described. The board should ask for evidence that the oversight mechanisms it has approved are actually working under realistic conditions.

If no — the governance structure would not have caught this even if it was fully functional — the governance structure needs to be redesigned. The incident reveals a category of failure the governance structure was not built to catch.

This is the value of causal analysis applied to AI governance: it makes the difference between a governance structure that is adequate for anticipated failures and one that is adequate for unanticipated ones. The second is harder to build. It is the one worth building.


The Board AI Governance Framework incorporates a causal governance review framework — the specific questions and counterfactual analysis structure that boards can apply to AI incident reviews and deployment approvals to identify structural governance gaps rather than proximate causes.

For boards seeking advisory support on AI governance and causal analysis methodology applied to technology risk, contact Steven directly.

Steven Vaile

Steven Vaile

Board technology advisor and QSECDEF co-founder. Writes on AI governance, quantum security, and commercial strategy for boards and deep tech founders.