Token Economics: What Every Developer Needs to Understand Now

The subscription model for AI was always a lie. GitHub just stopped pretending.

AI Buffet is Closing

You fire up Copilot on a Monday morning. Kick off an agentic session. Ask it to crawl your repo, write a suite of tests, review three open PRs, and draft a migration plan. It runs for four hours. Autonomous tool calls. Deep context. Multiple model switches.

Cost to you: $10 a month on Pro, $39 on Pro+. Same as the month you only used it to autocomplete a function.

That math never worked. GitHub absorbed the difference, inference costs, model compute, context windows, while you paid a flat subscription. The weekly cost of running GitHub Copilot nearly doubled since the start of 2026. So on April 27, GitHub made it official with a full announcement on their blog: every Copilot plan moves to usage-based billing on June 1, 2026.

What opens instead is something closer to a cloud bill.

You've been paying for the gym. Now you're paying for every rep.

To understand why this matters, you need to understand what a token actually is.

Not as jargon. As a thing that now directly affects your bill.

A token is roughly 3 to 4 characters of text. Every interaction with Copilot, your prompt, the code it reads for context, the response it writes back, is measured in tokens. Short question, quick answer: a few hundred tokens. Multi-file agentic session with repo context loaded: potentially hundreds of thousands.

Here is where it gets expensive. When Copilot works autonomously, it does not just process your prompt once. It loops. It reads files. It calls tools. It checks its own output. Each pass consumes tokens, input and output, across every step of the loop. A four-hour autonomous coding session is not one large prompt. It is thousands of smaller ones, stacked.

GitHub's CPO Mario Rodriguez said it plainly in the announcement:

"Today, a quick chat question and a multi-hour autonomous coding session can cost the user the same amount."

That was the problem. Under the subscription model, GitHub had no way to charge for the actual work being done. The heavier your usage, the more GitHub lost. The lighter your usage, the more you subsidised everyone else.

The subscription model for AI was always a lie.

It just took agentic coding to make the lie obvious.

What we saw in our preview bill

I ran our team's April usage through the preview bill GitHub launched in early May. The number that came back: 1.5x our current spend under the new model.

That is before June 1. That is before any of the model multiplier changes kick in. That is on a team that already has reasonable habits around when to use which model. We are not running Claude Opus 4.6 for autocomplete.

A 50% increase, just from the way our existing usage maps to token consumption. If you have not run your own preview yet, do that first. Whatever you are imagining, the actual number is probably worse.

What changes on June 1

Three things shift at once, and they affect you differently depending on your plan.

If you're on a monthly plan, you auto-migrate to usage-based billing on June 1. Your subscription price stays the same: $10/month for Pro, $19/user/month for Business, $39/user/month for Pro+/Enterprise. But now that price buys you an allotment of GitHub AI Credits, not an unlimited number of premium requests. One AI Credit equals $0.01 USD. You can set a budget for additional credits if you go over.

If you're on an annual plan, you stay on the existing premium request model until your plan expires. But there is a catch: GitHub is changing the model multipliers for annual subscribers on June 1. Staying put is not neutral. It is a price increase in disguise.

If you're an organisation or enterprise, your included credit allotment scales by seat: 1,900 credits per user per month for Business, 3,900 for Enterprise. There is a grace period through September 1, 2026 with higher limits (3,000 and 7,000 respectively) to ease the transition.

And one more change that most people missed: Copilot code review now runs on GitHub Actions. Starting June 1, every PR review with Copilot counts against your included Actions minutes at the same rate as any other Actions workflow. It is a second meter, running quietly alongside the first.

Same price on the label. Very different product in the box.

This is where "same price" starts to hurt

For annual subscribers staying on the request-based model, here is what the model multiplier changes look like on June 1.

Model Current Multiplier New Multiplier Change
Claude Sonnet 4 1x 1x No change
Claude Sonnet 4.5 1x 6x 6x increase
Claude Sonnet 4.6 1x 9x 9x increase
Claude Opus 4.5 3x 15x 5x increase
Claude Opus 4.6 3x 27x 9x increase
Gemini 3 Pro 1x 6x 6x increase
GPT-4o 0x 0.33x New cost
GPT-4.1 0x 1x New cost
GPT-5.1 1x 3x 3x increase
GPT-5.4 1x 6x 6x increase
GPT-5.4 mini 0.33x 6x 18x increase
GPT-5 mini 0x 0.33x New cost

Look at GPT-5.4 mini. Goes from 0.33x to 6x. That is an 18x jump on a model people use because it is fast and cheap. Claude Opus 4.6 goes from 3x to 27x. Claude Sonnet 4.5, probably the most commonly used reasoning model in Copilot right now, goes from 1x to 6x.

These are not rounding errors. They are corrections. GitHub is repricing what these models actually cost to run.

If you are on an annual plan and using any of these models heavily, staying put is the wrong decision.

Your AI tool just got a salary. Someone has to manage it.

For a team of any real size, this is no longer a procurement decision. It is an infrastructure decision.

When we looked at our own usage, the distribution was the part that surprised me. A small number of engineers running Copilot CLI for hours a day were responsible for most of the token consumption. The rest of the team barely touched the agentic features. That is not a problem under flat billing. The variance gets averaged out across the seat count and nobody notices.

Under usage-based billing, the variance is the bill.

This is the same pattern every cloud-billed product eventually surfaces. A handful of users drive most of the cost. Without visibility, the imbalance is invisible until the invoice arrives. With visibility, it becomes a leadership question: do we throttle the heavy users, or do we let them run because their output justifies it?

There is no clean answer. But you cannot make the call without knowing the data, and most teams do not have that data yet.

If you have managed AWS/GCP costs at scale, this will feel familiar. If you have not, June 1 is your introduction.

The bigger signal

GitHub is not an outlier. They are just the first major AI coding tool to say out loud what everyone in the industry already knows: flat subscriptions do not work when the underlying cost is compute.

Cursor. Codeium. Amazon Q. Every AI tool running on a subscription right now is making a similar bet, that average usage stays low enough to absorb the heavy users. When it does not, or when inference costs do not drop fast enough, the repricing conversation happens. GitHub just had theirs in public, on a deadline, with a multiplier table.

The next tool will not give you a table. They will just quietly change what is included.

Every AI subscription you are paying right now is one bad quarter away from the same conversation.

Token economics is the new cloud economics. The teams that understand it now, that know what drives token consumption, how to pick the right model for the task, how to write prompts that do not waste context, will make better tooling decisions, spend less, and get more out of the tools they keep.

What to do before June 1

Run the preview bill first. Go to your Billing Overview page on github.com. The preview is live as of early May. Before you make any decision about your plan, see what your April usage actually translates to. If your number is anything like ours, you have planning to do.

Check your plan type. Monthly subscribers migrate automatically. Annual subscribers need to actively decide: stay on the old model with new multipliers, switch to monthly with a prorated refund, or upgrade to Pro+.

Audit your model usage. If you are regularly using Claude Sonnet 4.5, Claude Opus, or GPT-5.4, look at the multiplier table above and calculate the exposure against your current usage volume.

Set a spend limit. Under usage-based billing, you can cap additional credit purchases. Set it before June 1, not after your first surprise bill.

If you're managing a team, decide now whether model access should be open or restricted. Look at the distribution, not the average. Who are your heavy users? Are they your highest-output engineers, or is the agentic workflow being used as a substitute for thinking? That is a different conversation, but it is the one you should be having before the bill arrives.

The subscription model for AI was built for a simpler Copilot, one that autocompleted lines of code and answered quick questions. That Copilot is gone. What replaced it is closer to a junior engineer who can work autonomously for hours, read your entire codebase, and reason across complex problems.

That engineer does not work for $10 a month. They never did.

Now the bill reflects it.


Token consumption is the new performance metric, and most developers are not tracking it yet. In the next post, I will break down exactly how to reduce the tokens your Copilot sessions consume: prompt patterns, model selection, context hygiene, and the agentic loop habits that silently drain your credits.

Subscribe to Sahil's Playbook

Clear thinking on product, engineering, and building at scale. No noise. One email when there's something worth sharing.
[email protected]
Subscribe
Mastodon