Skip to content
AI & Strategy · June 3, 2026 · 13 min read

AI cost management:
why Uber and Microsoft are pulling back on AI

On exploding token usage, the myth that cutting AI is smart, and what every organisation can set up tomorrow.

Illustration for article: AI cost management: why Uber and Microsoft are pulling back on AI

Last month I burned through a budget in a few days that was supposed to last a month. Not because of one expensive mistake. Simply by using Claude Code and Cursor the way they are meant to be used.

This is exactly what happened at Uber. Except there it was not a few hundred euros but an annual budget gone in four months.

TL;DR

  • Uber burned through its annual AI budget in four months. Not because the tools underperformed, but because engineers used them so intensively that no one had set a spending cap.
  • The explosion is happening among engineers, not the average user. A developer running agents all day can cost $500–$2,000 a month; a colleague asking AI to polish an email costs a fraction of that.
  • Cutting AI broadly means throwing away the savings. The ROI for knowledge workers is real — the cost explosion is concentrated in a small group of heavy builders.
  • Current token prices are subsidised by investors. OpenAI is expected to burn $14 billion in 2026. A price normalisation of 30–50% within one to two years is realistic.
  • The fix is simple: caps, visibility, and a distinction by user type. Not less AI, but knowing where the meter runs away.

What really happened at Uber, Microsoft and Nvidia

You have probably seen these stories circulating. They have been spreading across LinkedIn and X with the same narrative: look, even the big tech companies are pulling back on AI, the bill is too high, the bubble is bursting.

That is not right. Or more precisely: it is half right, and the missing half is the interesting half.

Uber rolled out Claude Code to around 5,000 engineers in late 2025. By April 2026, the complete AI budget for the entire year was gone. The CTO admitted it to The Information, via Fortune (opent in nieuw venster): the budget he thought he needed had already evaporated. Not because the tools underperformed. The opposite. Usage climbed to 70% of all code the company shipped being written with AI assistance, and the share of engineers working with agents jumped from 32% to 84% in two months. An agent is an AI that does not give one answer but autonomously executes an entire sequence of steps, planning and carrying out tasks without you having to sit next to it. Costs per engineer ran to between $500 and $2,000 per month.

Microsoft did something similar. In May the company began revoking internal Claude Code licences in its Experiences + Devices division — the team behind Windows, Office, Teams and Surface. Engineers must switch to GitHub Copilot CLI by a deadline of 30 June. That is, not coincidentally, the last day of the fiscal year. The official reason is "toolchain unification". The real reason, according to The Next Web (opent in nieuw venster), lies in the calendar and the bill. Claude Code became so popular internally that it started displacing Microsoft's own Copilot. It is hard to sell Copilot to the rest of the world while your own employees are overwhelmingly using something else.

Then there was Bryan Catanzaro, VP at Nvidia. He told Axios, reported by Fortune (opent in nieuw venster): "For my team, the cost of compute is far beyond the costs of the employees." The computing power his team consumes is more expensive than the salaries of that same team. Worth noting that he was talking about his own research team, one of the most compute-intensive teams in existence, so that is not a statement about your office but about the extreme upper end.

Three stories, one common thread. And that thread is not "AI does not work." The thread is: AI works so well that people never put it down, and nobody had thought about what that would cost.

The first cost that scales with enthusiasm

This is the point I almost never see in any analysis. Think about how business costs normally work.

Headcount scales with salary. More people, higher payroll. Predictable. Customers scale with support. More customers, more questions, more service desk. Also predictable. Almost every classic cost driver grows alongside something you can measure and budget.

Token consumption does not. To be clear: a token is a piece of text, roughly a word, and you pay AI tools per token they read and write. The more you let the tool work, the more tokens, the higher the bill. And that consumption grows with how useful and enjoyable your people find the tool. An enthusiastic engineer running agents all day costs you a multiple of a colleague who barely touches it. Same role, same salary, a completely different AI bill. That is a new type of cost, and no budgeting model is built for it.

It gets stranger. At Uber, and according to AI Magazine (opent in nieuw venster) at Meta too, there were internal leaderboards ranking teams by token consumption. They called it "tokenmaxxing." You were actively encouraged to use more, as if consumption equals productivity. It does not. It is the meter reading for your enthusiasm, not for your output.

This is exactly the Jevons paradox, which I wrote about earlier in the golden age of the programmer. Make something more efficient and cheaper, and you do not use less of it. You use much more. The price per token has been falling for years. Total consumption races ahead.

The explosion is among developers, not the average user

This is the part most viral posts flatten completely. "AI is getting too expensive" is far too blunt. The question is not whether AI is expensive. The question is where.

The explosion sits in one specific place: intensive agentic use by developers. Claude Code, Cursor, agents that autonomously traverse an entire codebase, plan steps, read and rewrite files. Such an agent consumes more tokens in a single task than an ordinary user does in a week. At Uber, 11% of live back-end updates were at one point being executed entirely by agents with no human intervention. That is fantastic for speed. And it is a bottomless pit for your token budget.

But the average user in your organisation is nowhere near this. Someone using ChatGPT to rewrite an email, summarise a meeting or structure a document consumes a fraction. For that person the cost picture is very manageable, and AI delivers time savings and genuine value. That is my own experience, and that of virtually everyone I discuss it with. The typical knowledge worker costs you a few tens of euros a month and saves you hours.

That distinction determines whether you draw the right conclusion. If you read the Uber stories and panic-close all AI access, you are solving a problem with a handful of heavy users by throwing away the gains for the vast majority. That is not cost control. That is shooting yourself in the foot.

The mistake is not in AI. The mistake is in treating an agent that runs through the night and a colleague who asks for a quick email check as the same thing.

Who is actually watching the meter?

When cloud computing matured, companies gained a new discipline: FinOps, a contraction of financial management and operations. It was nothing more than a way to keep grip on cloud bills that were suddenly no longer predictable, because you were paying for consumption rather than a fixed package. Nobody found it exciting. It just became part of how you run a serious organisation.

AI is now at exactly that same point. Only years earlier than most executives realise. A Dutch analysis by BeSharp (opent in nieuw venster) puts it sharply: organisations that could rely on fixed licence costs for years are now facing a dynamic cost structure that is hard to predict and difficult to cap. You used to pay a fixed amount per user; now you pay by consumption. Your AI bill is going to look more like an energy bill than a fixed subscription.

And then comes the question that has not yet been asked in most boardrooms. Who actually owns AI consumption? Who holds the brake? In most organisations I speak with, the answer is: nobody. It is spread across teams, no one sees the total, and the first time anyone is shocked is when the annual accounts come in.

That is not an IT detail but a governance responsibility. I wrote earlier about the blind spot in the boardroom when it comes to technical expertise at board level. This is the same blind spot, but with a price tag that grows every month.

The fix is not complicated. Give every user a monthly cap. Set a higher limit for heavy agent users and a lower one for the rest, so the people with the most leverage have headroom and everyone else has a reasonable maximum. Above all: create visibility. A dashboard showing who is consuming what and on what. That is not a luxury. That is the difference between steering and being surprised after the fact.

The ROI question becomes unavoidable — and that is healthy

There is another layer I do not want to skip. Because you can get control of consumption, but the real question for an executive is different: how do you justify continuing to spend this much? What is the ROI?

That is where the most painful line in the entire Uber story sits. It was not the CTO who said the budget was gone. It was the COO, Andrew Macdonald, who according to the same Fortune article admitted he could not draw a direct line between all those AI expenditures and demonstrable improvements for the customer. 95% of engineers used AI monthly, 70% of code came out of it, and still he could not prove what it delivered.

That is the real problem. Not the size of the bill, but the absence of an answer to what you got back for it. When AI was cheap and new, nobody had to ask that question. You joined in because you had to. That phase is over. When the bill becomes serious, the ROI question becomes unavoidable.

And that is healthy. It forces you to ask the right question. Not "how much does AI cost us?" but "what do we keep after subtracting consumption, oversight and rework?" 70% AI-generated code sounds impressive until you ask what it net delivered in working functionality. Consumption is not output. An analysis of AI value realisation (opent in nieuw venster) puts it well: without that connection, AI remains an uncontrollable cost rather than a strategic investment.

There is a surprisingly simple way to answer that question. Prosus, the parent company behind Takeaway.com, built 60,000 AI agents and hit exactly this problem: how do you determine the value of an agent? They came up with the "delete it tonight" test. Managers were asked one question: what happens if we permanently switch off this agent tonight — what does that cost in revenue, expenses or time? Suddenly they could quantify the ROI.

We used to do the same thing, we just called it the squeak test. If we were not sure whether anyone was still using a feature, we just turned it off and waited a week. If nobody squeaked, the answer was clear. If someone did squeak, you instantly knew who the real user was and what it was worth to them. Same logic, different words. The power is in the reframing: not "what does this do for us" but "what do we miss when it is gone."

And then there is the price you cannot control

Everything so far has been about costs you can steer yourself. Your consumption, your quotas, your ROI. But there is a second floor beneath this whole story, and it is more uncomfortable because it lies outside your own influence.

The price you pay per token today is probably not the real price. The big AI labs are currently operating at a loss. OpenAI is expected to burn $14 billion in 2026, and according to Axios (opent in nieuw venster) the margins of model providers remain negative despite falling token prices. That is not accidental. They are deliberately selling below cost to capture market share, funded by the largest capital injection in tech history. The price list on your API dashboard is subsidised by investors who eventually want their money back.

Multiple analyses, including one from AI Automation Global (opent in nieuw venster), assume those prices will normalise within one to two years. Upward, that is, with estimates of 30 to 50 percent. Not because the technology gets more expensive but because the subsidy eventually ends.

Here is the interesting part. For you as a user this is no reason to panic — it is actually a windfall while it lasts. You are building on compute that is cheaper than it should be. But there is one condition. You cannot build your entire operations on the silent assumption that this price will last.

That is a governance question, not a procurement detail. Anyone who now makes a process entirely dependent on one model from one provider at the current price is taking a risk that has nothing to do with their own consumption. The insurance against it is called model independence: ensure you can switch providers if the price shifts. That is the same reasoning you would apply to any other critical supplier. With AI almost everyone forgets it, because prices have only fallen so far.

You do not have a billion-dollar buffer

What strikes me most about the whole discussion: everyone is talking about Uber, Microsoft and Nvidia. But those are exactly the companies that can absorb this.

Uber spent $3.4 billion on R&D in 2025 alone. When the complete AI budget evaporated in four months, the company shrugged, ran an internal review and moved on. An organisation with a billion-dollar buffer can absorb a budget explosion and turn it into a learning experience.

You probably cannot. And that is the real lesson. If the giants with unlimited resources get caught out like this, what do you think happens at a fifty-person company with no dashboards, no quotas, and no one watching the consumption? The margin for error is far smaller there, and that is precisely where almost nothing is being written.

The good news is you can implement the fix today, and it costs nothing. Set a cap per user. Give your heavy builders headroom and everyone else a reasonable maximum. Make sure someone sees the total — not only when the annual accounts arrive. And keep that distinction sharp between the agent running through the night and the colleague asking for a quick email check, because that is where the entire cost distribution lives.

I started this piece with my own bill that evaporated in days. The funny thing is: I did not cut back my usage afterwards. I was getting too much back for that. I just learned where the meter runs away, and I put a brake on it there. That is not a retreat. That is grip.

And that conversation — about who is responsible for what AI costs and delivers — does not start in IT. It starts in the boardroom.