BLOG // 2026.04.12 // 10:00 SGT
The Token Bill is Due: The Brutal Unit Economics of AI Agents
The delusion of endlessly cheap compute is over—autonomous agents devour tokens at a scale that is forcing a brutal reality check on the unit economics of AI.
We are hitting the wall of physics and finance. The honeymoon phase of generative AI is definitively over, and the market is finally looking at the bill.
For the last three years, the industry operated under the delusion that compute would endlessly scale downward in cost while capabilities scaled upward. But you cannot brute-force intelligence without breaking the balance sheet. The news this morning highlights a hard reality: excessive token consumption is pushing AI giants toward financial rupture.
We are no longer just paying for simple prompt completions. We are paying for autonomous agents that loop, iterate, and devour tokens by the millions in the background to complete a single workflow.
The Unit Economics of Agentic Architecture

Look closely at OpenAI's new $100 Pro tier. It features 5x usage limits specifically for Codex. This reveals far more about the baseline requirements of agent architecture than it does about their consumer pricing strategy. Code generation and agentic reasoning require massive, sustained context windows. When your infrastructure costs compound exponentially with agent loops, but your revenue scales linearly with human subscription, your margins inevitably collapse.
The market's immune response to this is a violent correction toward open reasoning. We are now seeing models like Arcee's Trinity-Large pushing 398B parameters for $0.90. This is the order-of-magnitude shift required to make agentic workflows actually viable in a production P&L. If you are a startup CTO relying entirely on closed-source, pay-per-token API calls for your core agentic loops, your runway is already gone — you just haven't checked the dashboard yet.
The Velocity Paradox in the Enterprise

Operating across APAC — from my time at Amazon to scaling teams at GoPomelo and Digital China — I’ve watched enterprises buy into every hype cycle. Right now, the mandate from every board in Singapore is to "move faster on AI." But what are they actually shipping? Demos.
The reality of enterprise deployment is friction. Infosys just partnered with Harness to solve what they accurately call the "AI velocity paradox." The paradox is simple: our ability to write code using AI has vastly outpaced our ability to secure, govern, and deploy it.
You can generate a microservice in ten seconds today. Can you pass compliance, security audits, and integration testing in ten days? Usually not. We have optimized the absolute cheapest part of the software development lifecycle — writing boilerplate — and entirely ignored the expensive parts. Shipping a demo is easy. Deploying resilient infrastructure that doesn't leak customer data is hard work.
The Inversion of Labor and Attention

Time is the ultimate constraint. As operators, we only have so much of it to allocate across three domains: career, family, and finance. We assumed AI would give us more time by acting as our tireless interns. We were wrong.
We have crossed a bizarre threshold where bots have started hiring humans. Agents are now dispatching micro-tasks to human workers for physical or edge-case workflows they cannot solve digitally. The dynamic has inverted. You aren't managing the AI — the AI is managing your queue.
Simultaneously, the consumer web is dying under its own synthetic weight. NYT reporter Tiffany Hsu recently highlighted how the sheer volume of AI-generated online influencers and synthetic content is producing massive exhaustion for users. Why are we surprised that users are tuning out? We built infinite content engines but forgot that human attention is strictly finite.
Stop building infinite content generators that nobody asked for. Stop ignoring the unit economics of your agentic loops. The winners of this cycle will not be the ones with the flashiest synthetic influencers or the most expensive API wrappers. The winners will be the operators who figure out how to run 398B parameter reasoning models at ninety cents, solve the deployment paradox in the enterprise, and build tools that actually give humans their time back.