Does every feature we build with AI need a token budget?

Towards hypothesis-driven development

Ben O'Mahony

Published: June 05, 2026

In April, Uber's CTO Praveen Neppalli Naga admitted the company had burned through its entire 2026 AI budget. The main driver was Claude Code, which grew from 32% to 84% adoption across Uber's 5,000 engineers in just four months. Heavy users were spending between $500 and $2,000 a month each in API costs. Naga told The Information he was "back to the drawing board" on AI budgeting.

What’s perhaps most striking about this story is that Uber isn’t an outlier here. At Meta, for example, an internal token leaderboard turned token consumption into a status symbol. Employees would reportedly compete for "Token Legend" rank. In one 30 day window, Meta staff burned through 60.2 trillion tokens, a figure that would cost around $900 million. There are similar tales at Microsoft and Salesforce. Taken all together, we’re in an era of what newsletter writer Gergely Orosz has termed 'tokenmaxxing'; conspicuous computation consumption that's become a kind of status symbol.

Estimate value and set a budget

The fix isn’t new tooling. Instead, we need to treat each feature as something that has a value and budget across its whole life. That value estimate and budget should be visible before the work starts (even if that changes or evolves over time).

In an AI context we need to think in terms of a token budget for features, this should answer three questions:

Build budget. How many tokens and how much engineering effort will it take to ship?
Run budget. What does it cost per invocation, per user, per month at expected volume?
Maintenance budget. What does it cost to keep working as models change, prompts drift and dependencies shift? Remember people may well build further features on top of this.

The last one is where teams consistently under invest. Models deprecate, vendor pricing changes and evals need to be re-run whenever something upstream moves. These are all things that story point estimates don’t account for.

It’s great that AI accelerates the build step, but it doesn’t accelerate finding out whether anyone actually wants the thing. We need to reckon with the fact the bottleneck has now moved to understanding product impact.

Ben O'Mahony

Principal AI Engineer, Thoughtworks

It’s great that AI accelerates the build step, but it doesn’t accelerate finding out whether anyone actually wants the thing. We need to reckon with the fact the bottleneck has now moved to understanding product impact.

Ben O'Mahony

Principal AI Engineer, Thoughtworks

Hypothesis-driven, not technology-driven

A budget is only useful if you’re willing to spend it on the right things and refuse to spend it on the wrong ones. Crafting the value estimate as a hypothesis allows you to do more thorough experimentation.

Most enterprise feature pipelines aren’t hypothesis-driven. They’re opinion-driven, and the opinion is usually weighted towards whoever in the room has the most senior title. That worked (albeit badly) when the marginal cost of a feature was mostly developer salary. It becomes a financial risk when the marginal cost is a metered API bill plus an indefinite tail of maintenance. With AI powered coding you can create a legacy app in a day.

Yes, it’s great that AI accelerates the build step, but it doesn’t accelerate finding out whether anyone actually wants the thing. Ultimately, we need to reckon with the fact that the bottleneck has moved to understanding product impact. Remember that this is back of the envelope maths not a research problem in its own right.

Software has always had a maintenance problem

While organizations are okay at estimating the cost of building a feature, they're often bad at estimating the cost of keeping it alive. This means enterprise applications end up bloated with nice-to-have capabilities that were never profitable, propped up by a small core that actually generates revenue. Businesses forget that every feature in an enterprise codebase ultimately has a running cost: servers, on-call rotations, security patches, the cognitive overhead of one more thing in the system.

This is exacerbated by AI. The marginal cost of running new features is variable and scales with usage rather than infrastructure size. A poorly scoped agent loop can quietly chew through thousands of dollars before anyone notices. The situation isn’t unlike Disney’s Fantasia: in this instance, the brooms are running on someone else's GPU cluster.

Budgeting tokens in practice

Fortunately, there are things that can be done. A team adopting feature budgeting can start small:

Attach a token budget to every ticket and feature, alongside the usual estimate.
Distinguish build budget from run budget from maintenance budget and track all three.
Tie budget allocation to a stated hypothesis and (ideally) a cheap way to validate it.
Put circuit breakers in place so a runaway agent will be caught in minutes, rather than the end of the month. Shopify's experience with circuit breakers and usage dashboards is instructive here.
Refuse to fund features whose run budget exceeds their plausible business value, even if the build budget is cheap.

This isn't frugality for its own sake; Uber's situation is interesting precisely because the tools were too valuable to switch off. The point is to know which features are worth the bill before the bill arrives, and to recognize that some of the most expensive features in the portfolio will be ones nobody ever asked for.

Where next?

For now, the question is enough and deserves more consideration and reflection in the coming months: Should every feature have a token budget? Or, put differently: if you cannot say what revenue a feature will bring in and how much it should cost to run and maintain, do you know enough to build it.

Finally, it's worth remembering Goodhart's Law: when a measure becomes a target, it ceases to be a good measure. How this all plays out will certainly be interesting to watch and will ultimately dictate the shape of AI-assisted and agentic software development in the future.