Navigating today’s AI token crisis

The fresh urgency of two particular principles of the Agile Manifesto

Matt Kamelman

Published: June 10, 2026

The second principle of the Agile Manifesto — "welcome changing requirements, even late in development. Agile processes harness change for the customer's competitive advantage" — was written in 2001 to address a common failure mode in software delivery: treating plans and specifications as protection against uncertainty. The authors argued that change isn’t a disruption to be controlled, but instead a source of competitive advantage.

Twenty-five years later, that principle describes something more specific and urgent for organizations building with AI: a structural requirement for surviving an economic architecture that is changing faster than most organizations are designed to adapt to.

Over the last year, most enterprises concentrated on the visible surface of AI adoption — models, copilots, agents, productivity claims. The less visible shift happened underneath. AI has moved from a software procurement decision into an architectural governance problem.

What started as a finance issue has not stopped being one, but has, in fact, expanded. Token spend is now simultaneously a finance problem, an engineering design problem, a delivery governance problem and, for organizations that haven't yet noticed, a strategic liability. The difficulty is that none of those functions currently owns the aggregate.

The evidence accumulated from several directions simultaneously:

Uber's CTO acknowledged that his company had burned through its entire Claude Code budget for 2026 by April. Uber's President and COO Andrew Macdonald then stated he couldn’t draw a clear connection between rising token consumption and an increase in useful consumer features — a candid admission that the organization had been spending ahead of demonstrated value.
Microsoft ended most internal Claude Code licenses and moved engineers toward Copilot CLI.
GitHub moved from predictable subscription pricing to usage-based AI credits, forcing organizations to think about consumption patterns in ways they previously reserved for cloud infrastructure.
Duolingo reversed its decision to evaluate developer performance using AI activity metrics after discovering that usage does not reliably correlate with value creation. None of these are isolated product decisions.

Taken together, they illustrate that the first wave of enterprise AI adoption was built on economic assumptions that are already breaking.

Why token budgeting is hard

The symptoms are showing up in budget reviews in ways that follow a consistent pattern. "In April and May, I started hearing from companies: 'Oh my god, we are 3x over our entire 2026 token budget and it's only April,'" said J.R. Storment, executive director of the FinOps Foundation. One company reportedly ran up a $500 million AI bill in a single month after failing to set usage limits. A Priceline employee described a routine Cursor contract renewal coming back four to five times more expensive. What makes this even more challenging is that the consumption pattern isn’t linear; agentic systems don't behave like traditional software. Unlike CPU utilization, which saturates and crashes a container, an unconstrained agent loop continues consuming tokens while appearing healthy. The first indication of failure may arrive as an invoice.

The engineering diagnosis

The engineering diagnosis is relatively clear once you look for it. Large portions of enterprise AI traffic are still being routed through premium reasoning models for tasks that don't require premium reasoning — classification workloads, validation pipelines, structured transformations. It’s the 2026 version of deploying a Kubernetes cluster to run a cron job. Other costs accumulate from agent design choices: retry instructions without hard boundaries create expensive loops that are invisible to standard infrastructure monitoring. Stateful systems prepend thousands of tokens of conversation history to every new request while RAG pipelines retrieve generously because retrieval appears cheap relative to generation. Each decision seems rational in isolation, but collectively they become structural waste. These are architecture problems that require architecture solutions; semantic caching, tiered model routing and token circuit breakers that terminate runaway loops and escalate to human operators before the damage compounds.

But the harder and more important diagnosis is organizational. Token waste at this scale is rarely caused by individual developers. It’s caused by defaults — model selection defaults, context management defaults, agent design defaults, governance defaults — that were set during a prototyping phase and never revisited before production. While teams made sensible local optimizations, no one owned the aggregate outcome.

This is where the second Agile principle becomes uncomfortable to invoke honestly. The original argument wasn't that organizations should tolerate change. It was that they should be structured to exploit it faster than their competitors. Most enterprises right now are doing neither; they’re being changed by AI economics rather than harnessing it. The advantage goes to whoever builds the organizational reflex first, not whoever has the best models.

When the constraint is physical and compounding, you cannot optimize your way out of it at the billing layer. You have to move the decisions that generate the cost upstream to where they can actually be governed.

Matt Kamelman

Innovation Choreographer, Thoughtworks

When the constraint is physical and compounding, you cannot optimize your way out of it at the billing layer. You have to move the decisions that generate the cost upstream to where they can actually be governed.

Matt Kamelman

Innovation Choreographer, Thoughtworks

The need for industry standards on tokens

The clearest sign this is becoming a structural issue arrived this week when the Linux Foundation announced its intent to launch the Tokenomics Foundation — an initiative focused on establishing open industry standards, benchmarks and best practices for the economics of AI infrastructure, working in close partnership with the FinOps Foundation.

Research from Goldman Sachs, cited in the announcement, projects global token usage will multiply 24 times between 2026 and 2030, reaching 120 quadrillion tokens per month, with the inference market expanding from roughly $106 billion in 2025 to $255 billion by 2030. Initial supporters already include Accenture, Booking.com, IBM, KPMG, Oracle, Salesforce, SAP and ServiceNow.

Industries don't create standards bodies for temporary problems; they create them when a problem is becoming structural and when no single vendor can be trusted to define the standard.

Beyond cloud-hosted API consumption

There’s a longer arc here that deserves attention. Several organizations are beginning to explore a trajectory beyond cloud-hosted API consumption: running capable models locally, on-device or on private infrastructure. The technical constraints remain real — throughput, model quality, operational complexity — but the gap is closing. Smaller, high-quality open-weight models are increasingly competitive on the task categories that consume the most enterprise token spend. The economics of self-hosted inference are becoming viable precisely because the economics of hosted inference are becoming more and more painful.

Teams experimenting with local deployment today aren’t being scrappy; they’re building the architectural judgment about which workloads belong on private infrastructure versus which require frontier capability — judgment that cannot be purchased when you need it and has to be developed in advance.

What's worth noting is that the token crisis is not the origin of this pressure — it is a downstream symptom of it. The physical infrastructure supporting AI scaled faster than the energy systems built to sustain it. Data center energy consumption is approaching 1,050 TWh in 2026, which would make data centers the fifth largest energy consumer in the world if counted as a country. Ireland's data centers are projected to consume 32 percent of the country's total electricity supply, a figure that has driven regulatory moratoriums on new grid connections.

The pattern isn't limited to one jurisdiction. Across multiple markets, projects secured years ago are stalled waiting for grid connections that don't yet exist. The token costs enterprises are managing today are, in part, the price signal of that physical scarcity being passed downstream. And this is precisely why the response has to be architectural rather than financial. The organizations responding well are re-architecting their entire delivery lifecycle. Upstream phases are shifting toward clearer planning, stronger specifications and deliberate context engineering; downstream phases toward containment boundaries and runtime oversight.

When the constraint is physical and compounding, you cannot optimize your way out of it at the billing layer. You have to move the decisions that generate the cost upstream, to where they can actually be governed.

Tuning and adjusting behavior accordingly

This is where the Agile parallel runs deepest and where the more challenging Agile principle actually applies. The twelfth and final principle of the manifesto reads:

"At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly."

That principle was always about something more durable than requirements management; it was about organizations developing a systematic capacity for self-correction and not reacting to each change as it arrives. The teams that will navigate AI economics well over the next five years are not the ones that respond correctly to this year's cost structure. They’re the ones that build the organizational metabolism to respond correctly to the next one (and then the one after that).

The second principle tells you to welcome change and harness it for advantage; the twelfth tells you how: by building an organization that gets better at adapting, continuously, rather than one that occasionally adapts when it has no other choice.

Twenty-five years after the manifesto was written and signed, both principles are more literal than their authors likely intended.

View less

Industries

Publications and Tools

All Insights

Navigating today’s AI token crisis

Why token budgeting is hard

The engineering diagnosis

The need for industry standards on tokens

Beyond cloud-hosted API consumption

Tuning and adjusting behavior accordingly

Explore a snapshot of today's tech landscape