hlfshell

#coding agents

What comes after the token discount bubble pops?

Coding agents are amazing - sure. I’m a fan and heavy user of them too, especially CLI agents. BUT - the existing coding subscriptions heavily discount token usage, to the point that most users are unaware of just how many tokens they are actually burning at any given moment to run their agents. This blissful ignorance is perfectly fine as long as the discounsts continue; the pain of running the agents aren’t being felt by the end user.

Why are they so heavily discounted? There’s a race to secure funding and users by the big AI model providers, resulting in heavily (prohibitively and unsustainably) discounted tokens to try and build a user base with dependence on them as a provider. Either users will have no choice but to deal with price hikes, or one of these providers will be the “last man standing” and finally make good on their untenable investor promises.

When the AI economic bubble pops for these providers (which, barring any major discovery in terms of hardware / model architecture, will happen), we’re likely going to see a large increase in token pricing. The next step? I see three possible outcomes.

1 - Tooling favors mixed model approaches, where specially tuned agents appropriately route to different model sizes of varying costs based on user preferences for costs and task categorization. We don’t need Claude Opus to center a div, but we do need it for a very complex architectural change. It wouldn’t surprise me that, for internal cost saving, this approach is implemented by the providers themselves as a singular packaged model endpoint or incorporated into the tooling.

2 - We see a sharp rise of business-focused large scale token pricing packages, similar to how we see reserved instance pricing in cloud infrastructure. Buying a years worth of tokens up front based on projected dev usage nets businesses discounts. This doesn’t bode great for model providers though - it creates a larger scale race to the bottom.

3 - Developers and product builders start buying the best possible pro-sumer hardware for headless agent machines (current day equivalent at time of posting would be something like an AMD Strix Halo Processor, like the Framework Desktop (which can run heavily quantized 70B models) ) to run smaller models that can do MOST work, but can call out or pass over to a more expensive model as need be. A kind of local model play on # 1.

I’m personally hoping for some mixture of 3 and 2. I would love to be able to run models locally, but understand that the hardware scale is a long way out from becoming cheaper or consumer grade. Being able to reserve token pricing in bulk to hand off on “capable enough” models to occasionally “upscale” is likely key.

Plus I just prefer local, user owned compute.

#AI #coding agents