Tool Collector

The Upgrade Treadmill — Why The New Version Always Calls

Rza

08 Mar 2026 — 5 min read

GPT-5 just dropped. Or Claude 4. Or Gemini Ultra 2.0. Or whatever it is this week. You read the announcement, scanned the benchmarks, felt a pulse of urgency, and started thinking about whether you should switch. You didn't finish the project you were working on with the current version. You won't finish it with the new one either. But for about 45 minutes, the upgrade felt like progress.

The Pattern

The AI tool landscape has a release cycle problem — not for the companies, who benefit enormously from it, but for the users who get trapped in it. Major model updates arrive every few months. Minor updates arrive weekly. Each one is framed as a step change. "Now 2x faster." "40% better at reasoning." "New multimodal capabilities." The framing is always the same: what you had before was insufficient, and what we have now is essential.

Here's what the upgrade cycle actually looks like from the user's side. Monday: you see the announcement. You read the blog post, skim the benchmarks, watch the demo video. Tuesday: you sign up for early access or upgrade your tier. You run your favorite test prompts through the new version. The outputs are — maybe — slightly different. Maybe better in some ways, worse in others. It's hard to tell. Wednesday: you've spent three hours testing the new version and zero hours doing the work you were doing with the old version. Thursday: you go back to your project, but now you're second-guessing whether to use the old version or the new one. Friday: a think piece drops arguing the new version is actually worse at your specific use case. You feel confused. The following Monday: another tool announces an update. The cycle restarts.

The treadmill metaphor is precise. You're expending real energy — time, attention, cognitive bandwidth — and ending up in exactly the same place. Or worse, because the old version you were competent with now has a new UI, changed defaults, and three features that moved to different menus.

The pattern accelerates over time. Once you're on the treadmill, every release feels relevant. You develop a monitoring habit — checking X, reading AI newsletters, watching YouTube breakdowns — that consumes hours per week even when you don't upgrade. The evaluation itself becomes a time sink, independent of the decision.

The Psychology

The upgrade treadmill runs on three psychological engines, and the companies building these tools understand all of them.

The implied inadequacy of the old version. Every "now with" announcement implicitly argues that the previous version was lacking. "Now with advanced reasoning" suggests the old version's reasoning was primitive. "2x faster" suggests the old version was too slow. These implications are almost never true for the user's actual workload. The old version was fine. It was doing what you needed. But the announcement reframes "fine" as "outdated" — and outdated feels uncomfortable.

The benchmark trap. New model releases come with benchmark scores. MMLU went from 86.4% to 89.1%. HumanEval went from 67% to 73%. These numbers are real, but they measure performance on tasks you will almost certainly never encounter in your daily work. [VERIFY: Studies on the correlation between benchmark improvements and real-world user satisfaction — limited research exists, but anecdotal evidence from developer surveys suggests benchmark improvements above ~85% on standard tests rarely translate to noticeable differences in typical use cases.] The benchmarks exist to give you a number to compare, because numbers feel objective even when they're measuring the wrong thing. Nobody publishes a benchmark for "how well does this model help a freelancer write client emails" — because that benchmark would show negligible differences between versions.

The manufactured scarcity of digital goods. Waitlists. Early access. Limited-time pricing. "First 1,000 users get the Pro tier free." These are scarcity signals applied to products with zero marginal cost. There is no supply constraint on a software subscription. The waitlist exists to create urgency, not to manage demand. The early access exists to make you feel special for adopting quickly, which makes you more likely to evangelize the product and less likely to evaluate it critically.

There's a subtler mechanism at work too — what you might call the "just testing" rationalization. "I'm not switching, I'm just trying it out." But testing is switching with plausible deniability. Testing takes time. Testing interrupts your current workflow. Testing creates a comparison frame that makes your current tool feel inadequate, even if it was working perfectly an hour ago. "Just testing" is the gateway drug of the upgrade treadmill.

The deepest cut is the sunk cost inversion. Normally, sunk cost keeps you with your current tool — you've invested time learning it, so you stay. But the upgrade treadmill inverts this. The investment in your current tool makes you feel entitled to the upgrade. "I've been paying for this for six months — I deserve the latest version." The loyalty becomes a claim, and the claim becomes a justification for spending more time evaluating.

The Fix

The fix is a schedule, not willpower. Willpower fails because every release announcement is specifically designed to overcome it.

Set a quarterly review cadence. Once every 90 days, you evaluate your tool stack. You read the release notes for anything that launched since your last review. You test anything that looks genuinely relevant to your workflow. You make upgrade decisions. Between reviews, you ignore all releases. Not "try to ignore" — actually ignore. Mute the announcement channels. Unsubscribe from the launch newsletters. The information will still be there in 90 days, and it'll be more reliable because the bugs will be fixed and the hype will have cooled.

Define "genuinely relevant" before you start the review. Write down three criteria specific to your work. "Handles code generation in Python" or "produces first drafts under 500 words that require fewer than three editing passes" or "maintains context across conversations longer than 20 turns." If the upgrade doesn't touch one of your three criteria, it's not relevant, regardless of how impressive the demo looks.

When upgrades actually matter. Security patches — always install these, but these aren't what we're talking about. Genuine capability jumps for your specific use case — if you needed image generation and your current tool just added it, that's worth evaluating at your next quarterly review. Price changes that affect your budget — if a competitor now offers what you need at half the cost, review it. Everything else — speed bumps, benchmark gains, UI refreshes, "new features" you'll use once — can wait.

The 90-day rule in practice. A new model drops. Your feed explodes with takes. You feel the pull. You check your calendar — your next review is in six weeks. You note the release in a document and move on. Six weeks later, you review. The initial hype has settled. Independent evaluations exist. Known bugs are documented. You can make an informed decision in 30 minutes instead of spending 30 hours chasing real-time information during launch week. The 90-day rule doesn't make you slower. It makes you calmer — and calm decisions are better decisions.

The upgrade treadmill stops when you stop running on it. The new version will always call. You just don't have to answer on the first ring.

This is part of CustomClanker's Tool Collector series — 14 subscriptions, zero running workflows.

The Upgrade Treadmill — Why The New Version Always Calls

Rza

The Pattern

The Psychology

The Fix

Read more

The YouTube + AI Pipeline

The Weekly Drop

The Tool Collector's Guide to Owning Nothing

Self-Hosting & Tinkering