Code you don't understand is a liability you can't defend

Reflecting on the Five Eyes security warning

Ken Mugrage

Published: June 25, 2026

When the leaders of the Five Eyes cyber security agencies issued their joint warning on AI and cyber risk this week (June 22), the headline was about speed: frontier models are compressing the time between a vulnerability being discovered and a working exploit existing. If the gap is closing, the argument goes, close it faster. Patch sooner, scan more and buy the tool that promises to keep pace.

While this may feel like the correct instinct given the urgency of Five Eyes warning, it’s ultimately an incomplete one. Indeed, the agencies say as much themselves, noting that tools alone won’t inevitably deliver security.

So, what will deliver it? The key point that’s all too often overlooked by missives that urge caution is that we need to tackle the harder question: understanding what you’re defending in the first place and whether you still understand it well enough to defend it at all.

The window is the wrong thing to watch

The vulnerability-to-exploit window is real, but it’s a symptom, not the disease. Thoughtworks' own response to the Anthropic Mythos Preview makes the point well, highlighting that a model that can autonomously find and weaponise flaws at scale changes the economics of attack.

This doesn’t mean that the answer is to try and win a race you cannot. It’s to change the terrain itself by prioritizing the handful of vulnerabilities that move real risk, contain the blast radius when something fails, and shrink the surface an attacker can reach.

Look closely and you’ll realize none of that is really about the speed of our reactions; they’re about whether the people responsible for a system can reason about how it behaves. That’s the metric worth watching: how much of what you run anyone could still explain. It might be called comprehensibility — whatever we decide to call it, though, it needs to be treated as an asset.

Comprehensibility is an asset, and it's depreciating

Comprehensibility is the degree to which the people who own a system can reason about its behaviour, locate its boundaries and predict what a change will do. It’s built slowly, by humans doing the work of understanding’ like any asset it depreciates if you don’t maintain it.

Kent Beck has described the underlying dynamic in a pretty elegant way: we’re accumulating code faster than we accumulate the trust to rely on it. When a model can produce a plausible module in seconds, the volume of software grows far quicker than the understanding that should accompany it. The result is cognitive debt — the widening distance between developers and the software they’re answerable for.

This is already the defining risk in legacy estates, where, as a recent piece on modernization in government and regulated industries observes, the most serious exposure is rarely the age of the technology. It’s the shrinking number of people who understand it, as undocumented behaviour piles up and the few who hold it in their heads move on. Generation at volume takes that slow, decades-long process and runs it at speed, across systems that are months old rather than thirty years old.

The connection worth drawing out is that the thing eroding your security posture and the thing eroding your team's grasp of the system are the same thing. They’re not two problems to be managed separately.

This is a security problem, not a quality one

It might be tempting to file comprehensibility under code quality, a virtue to aspire to once the urgent work is done. However, that understates it. Defence at machine speed is built on comprehension at every step:

Containing a breach means knowing what a compromised component can reach.
Prioritizing means knowing which flaws matter to your particular system.
Patching safely means knowing what a change will break before it breaks it.

Each of those is an act of understanding before it’s an act of tooling. Tooling cannot supply what the understanding is missing.

This is what we might call asymmetric defence. Segmentation, identity controls and monitoring built for live decisions give defenders an edge that doesn’t depend on out-pacing the attacker. They depend instead on systems whose internal boundaries are clear enough to reason about under pressure. A defence that can move at machine speed is only as good as the comprehension beneath it. Point it at a system nobody understands and all you have is a fast response to a problem you can’t properly see.

The tension sharpens as generation becomes the default. The exploitable surface grows while the pool of people who understand any given part of it shrinks. Speed on the attack side meets insolvency on the defence side. That’s the trap — faster patching walks straight into it.

Code is cheap; understanding is not

Before we go further, it helps to be precise about three words that are often used interchangeably:

Code is the text. It’s now cheap and fast to produce; it’s what a model generates.
Software, or the system, is the running whole and its behaviour over time. It’s what you operate, what you defend and the thing comprehensibility is a property of.
Architecture, meanwhile, is the structural discipline, the boundaries, segmentation and contracts that keep the system understandable. Architecture is an instrument for protecting the asset. It is not the asset itself.

The distinction matters because it clarifies what can and cannot be automated. Code can be produced quickly. The structure and the understanding around it cannot, at least not without serious medium-to-long-term trade-offs.

As a Thoughtworks piece on why code quality still matters argues, a model can generate or refactor code but it cannot define a system's guarantees, because those guarantees (its invariants, constraints, type safety and security boundaries) come from the structure rather than the syntax.

Experiments at Thoughtworks bear this out: when vibe coding is left unsupervised, it produces software that functions but resists understanding. The Technology Radar's caution on complacency with AI-generated code points the same way. The text arrives; the comprehensibility does not come with it.

From exhortation to enforcement

If comprehensibility is an asset, the practices that protect it are the instruments that service the debt. The shift those instruments require is from exhortation to enforcement, because at the volume code is now generated, a suggestion is not a control. The point is made plainly in The VibeSec reckoning: prompting a model to behave is a suggestion, while a coverage threshold or a deployment gate is enforcement, and anything you genuinely cannot allow has to be codified as a non-negotiable rule somewhere in the lifecycle. Telling an agent to be careful is not the same as making the unsafe path impossible.

The mechanism that holds this together is the harness. Birgitta Boeckeler’s work on harness engineering distinguishes guides that steer a model before it acts from sensors that check its output afterwards, and deterministic computational controls such as linters and test suites from slower, inferential ones such as AI review. Within that frame, three instruments do most of the work of keeping a system comprehensible.

The first is enforced specification. Tests and specs should be executable, gating definitions of what the system must and must not do, rather than prose that no one reads after the first sprint. This keeps intent legible to whoever inherits the code, and approaches, such as spec-driven development, show how to make the specification the control rather than the comment.
The second is boundaries. Contracts and strict encapsulation keep comprehension local, so that changing or containing one part does not require holding the whole system in your head. This is the same move that limits blast radius, which is why it pays back twice, once as comprehensibility and once as containment. The near-misses documented in the VibeSec piece, where a model reached for public storage and over-broad permissions, are failures of exactly these boundaries.
The third is observability. Instrumentation built for real-time decisions, rather than for compliance reporting, keeps the running system legible while it runs, which is the only time that legibility helps during an incident.

What ties the three together is a principle worth stating directly: rigor doesn’t disappear when a model writes the code. It relocates to these instruments and matters more, not less. Rachel Laycock's recent CTO newsletter makes this argument, and the 2025 DORA report supplies the evidence that AI amplifies the strengths and weaknesses already present in how a team works.

Make it measurable, then fund it

An asset you cannot measure is one you will not protect. That's why comprehensibility needs rough instrumentation even before it has precise metrics.

Here are a few starting points:

The proportion of a running system any single engineer can reason about without help.
The mean time to localize a fault.
How much of the codebase has no current owner who understands it.
Specification and observability coverage as proxies for how much intent is encoded and how much of the system is visible.

These are first approximations rather than a finished index. They should be refined for your own context.

The reason to measure it is to fund it. It should be clear that we need to treat managing vulnerabilities as a first-class investment rather than a cost to minimize — comprehensibility deserves the same standing. Seen this way, secure-by-design, which the Five Eyes statement asks organizations to make standard practice, is largely a matter of designing for comprehensibility, with security woven in from the start rather than added as a final layer.

Defend the asset

The window will keep shrinking; no tool you buy will change that. What you can change is whether the systems on the other side of the alarm are ones your people still understand. Defend comprehensibility as the asset and use architecture, contracts, specifications and observability as the instruments that keep it solvent. Security can then follow as the payoff rather than the starting point.

The question to put to your own estate is not how fast you can patch; it’s how much of what you run you could still explain.

View less

Industries

Publications and Tools

All Insights