Imagine you are looking at a large jigsaw puzzle. Several pieces are missing from the centre. The picture makes no sense without them, but the pieces themselves are somewhere under the table, invisible from where you are standing.
That is roughly what a governance failure looks like from inside the boardroom. The failure is visible. Its consequences are visible. The structural gap that allowed it to occur — the missing pieces under the table — is not visible from the vantage point of the people being asked to respond to it.
Most governance processes are designed for the visible part. They document the failure, describe the consequences, assign accountability, and produce a remediation plan. These are useful activities. They address what happened and who is responsible.
They rarely address what made it possible. That question requires causal analysis, and most governance frameworks in regular use are not equipped to answer it.
What governance failure actually is
Let me be specific about terminology, because the word “governance failure” is used loosely and the loose usage obscures the analysis.
A governance failure is not the incident itself. A data breach is an incident. An AI system producing a discriminatory output is an incident. A regulatory fine is a consequence.
The governance failure is the structural gap in the oversight mechanism that allowed the incident to occur, persist, or escalate without being caught. The governance failure exists before the incident. It is present during the incident. It is often still present after the remediation, unless the causal analysis identified it specifically.
The distinction matters because it determines what the remediation should address. If a data breach is treated as a technical failure, the remediation is a technical patch — better firewalls, updated software, stronger authentication. If a data breach is treated as a governance failure, the remediation addresses why the technical controls were insufficient, who was accountable for verifying them, what oversight mechanism was supposed to catch the gap, and why it did not function.
The second remediation addresses the structural condition. The first addresses the proximate cause. The first is cheaper and faster. It leaves the structural condition intact for the next incident.
Why boards treat symptoms instead of causes
This is not negligence. I want to be clear about that, because the usual implication of “failure to address root causes” is that decision-makers are failing at a basic task.
The mechanism is more subtle. Boards receive information about incidents in the form in which they are presented — which is typically a description of the incident and its consequences, followed by a remediation plan that addresses the technical or operational failure that caused the incident. The presentation is formatted for approval, not for causal inquiry.
A board that approves the presented remediation plan has done what the governance process asked of it. What the governance process did not ask — because most governance frameworks are not designed to ask it — is: what structural gap in this board’s oversight enabled this incident, and does the remediation address that gap?
This question is harder to answer than “do we approve the technical remediation?” It requires a different kind of analysis — one that works backwards through the causal chain rather than forwards from the incident to the fix. It requires someone in the governance process to ask counterfactual questions: what would have had to be different for this incident not to have occurred? What in our governance structure should have caught this before it became an incident?
The question requires time, access to the full operational history, and methodological willingness to follow the causal chain into uncomfortable places — including into the board’s own oversight decisions.
The specific gap in most governance frameworks
My background is in causal analysis and root cause analysis software. I spent a significant portion of my career at companies — RiverSoft, SMARTS, Voyence, all subsequently acquired by IBM or EMC — that built software to trace failure chains in telecommunications networks and military systems. The software identified the specific upstream condition that was the necessary and sufficient antecedent of the observed fault.
Applied to governance, the same methodology asks: not what failed, but what was the governance condition that was necessary for the failure to be possible?
The specific gap I see most consistently in board governance is what I call proximity without function. The board has approved governance structures. The governance structures are described in documentation. Board members believe the structures are working because they have been told the structures are in place. The structures exist. They are not tested. Whether they actually function — whether they would catch a failure of the type that has just occurred, or of a type that is about to occur — is typically not known.
Governance described is not governance verified. A documented escalation process that has never been tested under realistic conditions may fail under those conditions. A data quality monitoring programme that reports green across all metrics may be monitoring the wrong metrics for the specific failure mode that is about to materialise. The documentation does not tell you this. The causal analysis does.
The counterfactual test
The diagnostic tool I apply to governance failures — and that I recommend boards apply — is the counterfactual test.
When an incident is presented for board review, before approving the remediation plan, ask: if our governance structure had been fully functional and fully implemented, would this incident have been prevented, caught before it became consequential, or would it have reached this point regardless?
The answer has three possible forms. First: yes, a fully functional governance structure would have caught this — which means the governance structure was not fully functional, and the remediation must address why. Second: no, the governance structure as designed would not have caught this even if fully functional — which means the governance structure has a design gap, and the remediation must address the design. Third: uncertain — the governance structure might have caught this, but it depends on conditions we cannot verify — which means the governance structure lacks the visibility required for the board to know whether it is functioning.
All three answers are more useful than approving a technical remediation without asking the question. All three identify what the governance review must address.
What this looks like applied to AI governance
AI governance is where I most frequently apply causal analysis, because AI systems fail in ways that are particularly good at appearing to be technical failures when they are actually governance failures.
An AI system that produces a discriminatory output did not fail technically. It produced the output it was designed to produce, based on the data it was given, in a deployment context its designers did not fully anticipate. The technical system worked. The governance structure that was supposed to ensure the deployment context was validated against the model’s limitations — that was the failure.
The causal question is: what governance condition made it possible for this system to be deployed in a context its validation did not cover? Possible answers: the deployment scope review did not include a comparison of operational context against validated range. The human oversight mechanism was designed to catch the expected failure mode, not this one. The board approved the deployment without being told that the validation had a specific constraint that the deployment exceeded.
Each of these is a structural governance finding. Each points to a different element of the governance architecture that needs to change. The technical remediation — retraining the model, adjusting the output filter, expanding the test suite — is necessary but not sufficient. The structural remediation addresses the condition that made the technical failure possible.
A practical starting point for boards
Three questions that surface causal governance gaps without requiring specialist training:
First, when an incident is presented to the board, ask: what in our governance structure was supposed to prevent this, and did it function? If the team presenting the incident cannot answer this question, the causal analysis is not complete.
Second, when a governance framework is presented for approval, ask: what failure mode is this framework designed to catch that our previous framework was not catching? If the answer is “it comprehensively covers all the risks,” that is not an answer. Every framework has a designed coverage scope. Knowing what is outside that scope is as important as knowing what is inside it.
Third, once a year, test the escalation mechanisms rather than describing them. Pick a realistic scenario — an AI system producing an incorrect output that reaches a consequential stage, a data breach that is discovered externally rather than internally — and run through the governance structure with that scenario. Note where the escalation path is ambiguous, where the accountability is unclear, where the governance structure depends on a single person’s knowledge rather than a documented process.
The test will identify the proximity-without-function gaps before an incident does.
For boards seeking a structured approach to AI governance that incorporates causal analysis methodology, the Board AI Governance Framework provides the decision-making structure and the counterfactual review questions that boards can apply to incident reviews and deployment approvals.
For independent advisory support on governance and causal analysis applied to AI risk, contact Steven directly.