The Token Panic of 2026

The Token Panic of 2026

What Dark Fiber Can and Can't Tell Us

In mid-2026 a quiet little index found it's voice. The Silicon Data LLM Token Spending Index - which tracks the going rate paid per million tokens across the market - more than doubled from December through spring, then abruptly rolled over. Strategists who had been waiting for a crack in the AI trade pounced: if token spending is softening, maybe the whole hardware-and-data-center edifice is built on sand. Around the same time, ServiceNow and Uber blew through their entire annual AI budgets months early and had to renegotiate, GitHub moved Copilot to token-based billing and some users watched their bills jump close to 19x, and Anthropic spent the spring throttling programmatic access to Claude because agent traffic was outrunning what a subscription could absorb.

Out of that anxiety comes a tidy narrative: AI prices are artificially low, the subsidies have to end, and when they do, demand collapses and the bubble pops.

I think that narrative is half right. Here's the case, both for and against, measured against the one historical episode that actually rhymes.

The optimist's thesis

Strip the hype away and the bullish position rests on four claims:

  • Most people are using models far more powerful than the job requires. Frontier reasoning models are being pointed at tasks a small model would handle for a fraction of the cost.

  • Open weights are nearly free. A genuinely competitive model can be downloaded and run without paying anyone a per-token toll.

  • Now is the time to learn, while the meter is subsidized. Capability per dollar has never been this cheap, and someone else is currently eating the loss.

  • Almost nobody is driving the car correctly. Most token spend is friction — bad prompts, retries, unstructured agents — not value. Skill changes the economics more than price does.

These are strategic claims, not macro forecasts, and that distinction turns out to matter a lot.

The pattern that rhymes: the fiber glut

The closest historical analogue isn't the dot-com retailers like Pets.com. It's the layer underneath them - the telecom and fiber buildout that financed the internet's plumbing.

After the Telecommunications Act of 1996, carriers poured more than $500 billion — most of it debt-financed — into laying over 80 million miles of fiber optic cable across the United States. The build was justified by a now-infamous WorldCom talking point: that internet traffic was doubling every 100 days. It wasn't. Demand was real and growing fast, but nowhere near that fast, and the spending was frontloaded against a forecast that didn't arrive on schedule. When the gap became undeniable, the sector imploded. Global Crossing lost $3.4 billion in a single quarter on under $800 million of revenue and filed for bankruptcy; WorldCom and Qwest followed; Corning, the largest fiber maker, fell from roughly $100 a share to about $1. Four years after the crash, 85–95% of the fiber laid in the 1990s was still "dark" — unused glass in the ground. Bandwidth prices fell about 90%.

And yet. Every frame of video you stream, every cloud workload you run, travels over that "wasted" fiber. The overcapacity of 1999 became the backbone of the 2010s. The technology was civilization-altering and the people who financed it were wiped out. Both things are true. That is the uncomfortable lesson of infrastructure booms, and it's the one the economist Carlota Perez formalized: transformative technologies reliably pass through an installation frenzy, a financial reckoning, and only then a long deployment golden age. The crash is not evidence the technology was fake. It's a recurring feature of how we pay for the real thing.

So the question isn't "is AI a bubble or a revolution." History says it can be both. The useful question is which parts of today's spending are dark fiber.

The case for the thesis

On the merits, the deflation is not a vibe — it's one of the best-documented trends in the field.

Stanford's 2025 AI Index found that the cost to run a model at GPT-3.5 quality fell from about $20 per million tokens in late 2022 to roughly $0.07 two years later — a ~280x collapse. Epoch AI, looking benchmark by benchmark, puts the rate of decline anywhere from 9x to 900x per year depending on the capability milestone, with GPT-4-level performance on hard science questions getting roughly 40x cheaper per year. Whatever number you trust, the direction is violent and consistent.

Open weights make the point sharper. The gap between open and closed models on commodity benchmarks like MMLU narrowed from double digits to a fraction of a point inside a single year. DeepSeek's V4, Alibaba's Qwen 3.6, and GLM-5 now trade blows with closed frontier models on coding and reasoning — at 5–30x lower cost per token, and in many cases under permissive MIT or Apache licenses that allow self-hosting and commercial use with no API dependency at all. DeepSeek's V3 was reportedly trained for around $5.6 million in compute, against the roughly $100 million attributed to GPT-4's training. When near-frontier capability is available open-weight, the binding constraint stops being "can we afford the model" and becomes "can we design the system well."

Which is the fourth claim, and the one I'd defend most aggressively. A meaningful share of what gets reported as AI demand is actually waste: users writing bloated prompts to fence in a model that drifts, retrying tasks the agent couldn't plan across, paying frontier rates for classification a 4-billion-parameter model does fine. That's not demand. That's a skills gap denominated in tokens. Close it and your spend can fall by an order of magnitude without touching the price sheet.

And the subsidy is real, which means the "learn now" instinct is sound. Industry reporting consistently shows the flagship labs spending more to serve inference than they collect, the difference covered by venture capital and hyperscaler cross-subsidy. You are, right now, being handed frontier capability below cost. Learning on someone else's dime is just good sense.

The case against the thesis

Now the part that should make an optimist uncomfortable.

Falling per-token prices do not mean falling bills — and conflating the two is the central error. This is the "token cost illusion": a team watches unit prices drop 75% year over year and still gets a bigger invoice, because consumption outran the discount. Gartner estimates that agentic workflows consume 5 to 30 times more tokens per task than a simple chatbot turn. A reasoning model that "thinks" before answering can burn thousands of tokens producing one. So when the optimist says "prices are falling," they're describing the trailing capability tier. The frontier is moving the other way: one 2025 analysis found the cost of running genuinely frontier-level capability has risen roughly 18x per year, because each marginal gain demands disproportionately more compute. Cheap is getting cheaper; the bleeding edge is getting more expensive. Both curves are real, and which one you live on is a choice you may not fully control.

The repricing the panic is describing is already happening — and it isn't fictional. GitHub didn't move to usage-based billing for fun; its product chief said the flat-fee model was no longer sustainable under agentic load. ServiceNow and Uber didn't renegotiate because they wanted to. The era of all-you-can-eat AI subscriptions is ending, and for heavy users the true cost of usage-based pricing is a genuine shock, not a rounding error. So "token prices need to rise" is not simply false. Subscription prices and the economics of unlimited access are being repriced upward, even as the per-token commodity price falls.

The capex overhang is the dark-fiber risk, and it's enormous. The major hyperscalers are guiding to somewhere between $600 billion and $700 billion-plus in capital spending for 2026 — up from roughly $400 billion the year before — funded in part by over $100 billion of new debt, with capital intensity reaching 45–57% of revenue and free cash flow turning negative at the most aggressive spenders. That is a staggering bet on demand materializing on a particular schedule. It is precisely the structure that sank the fiber carriers: frontloaded, debt-financed, justified by a steep demand curve. If returns disappoint, capital flows reverse — and here's the trap for optimists: the same R&D and competition that drive prices down are funded by that capital. A sharp pullback could slow the very deflation the bullish case depends on.

"Practically free" hides the total cost. Open weights are free; running them is not. Self-hosting a serious model requires owning a serious machine and having the knowledge to know how to use it. And a real gap persists — as of April 2026 the best open-weight model still trailed the closed leaders on composite benchmarks, with the largest gaps exactly where it hurts: complex instruction-following, long-horizon agentic reliability, multimodal work.

The most likely outcome

Put the two halves together and the probable path looks less like a pop and more like a sorting.

The deflation continues for commoditized capability and the frontier stays expensive — the price curve bifurcates rather than collapses. Jevons' paradox keeps doing its work: as Satya Nadella noted after DeepSeek's first shock, making a resource cheaper tends to increase, not decrease, total consumption. Cheaper tokens mean more uses, which means aggregate spend keeps climbing even as each token costs less. So total industry revenue can grow while individual unit economics improve — the bullish and bearish facts coexist.

Subscriptions reprice and usage-based billing wins. The "unlimited for $20" era is closing. Expect sticker shock for power users and a wave of FinOps-style discipline, where teams that never tracked token spend suddenly have to. This is the repricing the panic is sensing — real, but a normalization, not an apocalypse.

An infrastructure shakeout is plausible without AI "ending." Some data-center bets will sour; some debt-financed players will be stressed; a few capacity bets will sit dark for a while. Then, in the dark-fiber pattern, the assets get repriced and absorbed by whoever has cash and patience. The technology keeps compounding through the financial reckoning, because the demand underneath it — unlike WorldCom's fantasy — is demonstrably here and growing.

Two things separate this episode from 1999, and they cut in the optimist's favor: today's spenders are deeply cash-generative incumbents, not story-stock startups living on bond issuance, and AI is usable the instant you have an internet connection — no last-mile buildout required before the demand can show up. The demand is real now. That doesn't prevent overbuilding. It does make a total collapse far less likely than the panic implies.

So who's right?

The strategic conclusions hold up. Match the model to the task. Build on portable, open foundations where the capability gap is invisible to your use case. Treat tokens as a measured cost, not a feeling. And above all, develop genuine skill with the models and the harnesses, because that — not the price sheet — is the biggest lever on what AI actually costs you. Learn now, while the meter is subsidized. All of that is correct, and most of the market still isn't doing it.

The one correction I'd make is to the framing. "The pullback is a false narrative" is too strong, because it conflates two different prices. Per-token capability is getting cheaper and will keep getting cheaper. But the price of unlimited, subsidized, frontier access is rising, and people who built plans on the assumption it never would are about to find out the difference the expensive way.

The pullback isn't a false narrative. It's a true narrative about the wrong number — and the people who understand which number is which are the ones who'll be standing when the frenzy fades and the deployment era actually begins.


Sources: Stanford HAI AI Index 2025; Epoch AI (LLM inference price trends); Andreessen Horowitz ("LLMflation"); Gartner (inference cost and agentic token consumption forecasts); Silicon Data LLM Token Spending Index; CNBC, Financial Times, and Tom's Hardware reporting on 2025–2026 hyperscaler capital expenditure; IEEE ComSoc and contemporaneous reporting on the 1996–2002 telecom/fiber buildout; TechCrunch and open-weight model benchmark trackers on DeepSeek V4, Qwen 3.6, and GLM-5; Derek Thompson, "The Great AI Cost Panic of 2026." Figures are paraphrased and, where noted, are estimates or ranges that vary across sources.