AI in the Age of Tokens: After the Hype, the Bill
By Pierre Wilmet
The New Subscription Model for AI: Tokens
On June 1, 2026, GitHub Copilot switched all of its offerings to pay-as-you-go billing. We're moving from “premium requests” to GitHub AI Credits. These are calculated based on the tokens actually consumed in input, output, and cache. While base subscription prices haven't changed, the included usage allowance has been drastically reduced. And most importantly, the safety net that allowed users to switch back to the free model once their quota was exhausted no longer exists.
The situation is such that on Reddit and X, some developers are talking about a “Tokenpocalypse.” They're posting projected bills that jump from a few dozen to several thousand dollars a month. Whether exaggerated or not, these figures tell us one thing: AI comes at a price, and that price hasn't been accurately reflected until now.
The stakes are enormous. Just as the kilowatt-hour is for electricity, the token is becoming the unit of consumption for AI. What's scary isn't the price of tokens, but what they reveal about the true cost of AI.
What is a token?
In order for AI to read a text, it is broken down into smaller units called tokens. These can be a word, part of a word, a punctuation mark, a space, and so on. These tokens form a vocabulary that the AI uses to read, predict, and generate text. A particularly complex and/or rare word will consist of more tokens than a simple or very common word. This will naturally affect its price.
When you use AI, three types of tokens come into play:
- Input tokens: what you provide to the model. Your question, along with everything that makes it up: the text, the context, any files. They represent everything you send to the AI.
- Output tokens: the opposite of input tokens: they represent all the elements of the AI's response (text, spreadsheet, image, etc.).
- Cached tokens: the context of your previous interactions with the AI, which has already been processed and is reused as needed. Generally cheaper, but not free.
This is the main difference between token-based billing and request-based billing: not all requests are equal in terms of tokens. Some very complex requests cost 100 times more than others. And every token consumed by the model is billed.
Why make this change now?
Over the past year, Copilot has evolved significantly. From a simple code-completion tool, it has transformed into an agent-based platform capable of analyzing a repository, modifying multiple files, generating pull requests, performing code reviews, and even chaining steps together.
That's why the business model was no longer viable: the gap between the tokens consumed by a single line of code suggestion and those consumed by an agent-driven session keeps widening. This is even more pronounced for input tokens, as the agent reloads context at each step.
This is just one example among many. Since the bill depends on the agents' behavior, and as they become increasingly autonomous, the user's intentions have less and less impact on it.
“Blind” precision
No one disputes token-based billing. The logic is clear, and usage becomes measurable: we can optimize prompts, choose the model best suited to our needs, and cut unnecessary context. Meanwhile, providers find an alternative to two bad options: throttling all users, or watching costs rise faster than revenue.
The flip side is that the bill becomes unpredictable: no one knows in advance how many tokens a task will consume. Even though it's fairly simple to outline guidelines for writing cost-effective prompts, you're never safe from a wrong path explored by the agent, unnecessary follow-up steps, or too many files being read.
The problem is that if every iteration comes at a cost, we hesitate to experiment. Yet a significant part of AI's value comes precisely from exploration. The risk of unnecessary spending can kill the urge to ask imperfect questions, request an alternative, or have a line of reasoning reviewed.
What about technical teams?
We're shifting from asking “Which model is most effective for a given task?” to “Which model will give the most relevant answer, at the most reasonable cost, with the smallest possible margin of error?”
There are therefore three key reflexes to keep in mind:
First, tailor the model to the task. Rewording an error message or generating a basic test doesn't require the most powerful model. An architecture analysis or a critical migration does.
Next, treat context as an asset. Sending the right files, the right logs, and the right recent changes produces a better result with fewer tokens than sending an entire repository “just in case.”
Finally, manage consumption the same way we already manage the cloud. “AI FinOps” practices are emerging: team-based budgets, alerts, automatic model selection, context caching, spending limits, audits, dashboards, and more. The issue is no longer just technical but financial, operational, and managerial.
What's next?
First and foremost, token-based billing makes AI usage conscious. In the short term, it will push users to make better use of their tokens, their prompts, and the different models at their disposal. In the medium term, products will build in more automatic optimizations: context reduction, dynamic model selection, cost estimation before execution, and so on.
We're leaving the experimental phase for a more reasoned, industrial one; we'll be chasing the best cost-to-quality ratio for the response.
The token is therefore more than a technical unit: it's a new unit of decision-making and a new cost line that a company can no longer ignore.
The question we'll leave you with: would you rather pay for AI based on actual usage (and spend time thinking about every prompt), or a predictable flat rate (part of which goes toward funding other people's usage)?
Get the next issue in your inbox
Now and then, our notes on AI, developer tooling and the craft of good work. No spam, unsubscribe anytime.