Tool Collector

The Upgrade Treadmill — Why The New Version Always Calls

Rza

07 Mar 2026 — 5 min read

There is a new model release roughly every two weeks now. Each one is framed as a leap forward. Each one makes your current setup feel suddenly insufficient — even though it was working fine yesterday. This article is about the treadmill: the mechanism that keeps you evaluating instead of executing, and why the new version almost never matters as much as the announcement implies.

The Pattern

The cycle goes like this. You open X or Hacker News and see the announcement: "GPT-5 Turbo is here — 2x faster, 40% better on MMLU, native tool use, 1M context." You read the thread. You read the blog post. You look at the benchmarks. You feel a specific sensation — not quite excitement, not quite anxiety, something in between — that translates roughly to: "my current setup is now obsolete."

By lunchtime you've created an account on the new version, or upgraded your existing subscription, or joined a waitlist. You run your standard test prompts. The results are... fine. Maybe marginally better in some areas. Maybe worse in others. The "2x faster" claim is real but irrelevant — you were never bottlenecked by inference speed. The "40% better on MMLU" is real but academic — you've never encountered an MMLU-style problem in your actual work. The 1M context window is genuinely new, but you've never had a document longer than 20,000 tokens.

None of this matters. The upgrade has already happened. You've already switched your default model. You've already started adjusting your prompts to the new model's tendencies. You've already lost two hours of productive time to "testing." And in three weeks, when the next release drops, you'll do it again.

The treadmill is not about any single upgrade. It is about the cadence — the relentless rhythm of releases that keeps you in a permanent state of evaluation. You are never settled. You are never proficient. You are always in transition, always adjusting, always one release away from "having the right setup." The right setup never arrives because the releases never stop.

The Psychology

Release announcements are engineered — not in a conspiracy-theory way, but in a straightforward marketing way — to make the previous version feel inadequate. The framing is always comparative and always upward. "Now 2x faster" implies the old version was too slow. "New capabilities" implies the old capabilities were insufficient. "Major update" implies everything before it was minor. The language of progress is also the language of obsolescence, and it works because humans are loss-averse. We don't upgrade because the new thing is great. We upgrade because the old thing suddenly feels like a liability.

The benchmark trap is the most reliable lever in this machine. A new model scores 3% better on a standardized test that measures capabilities you will never use in your actual workflow. But the number is concrete, legible, and comparable — three properties that make it feel meaningful even when it isn't. "92.4 vs. 89.7 on HumanEval" sounds like a reason to switch. It is not, unless you are literally running HumanEval problems as part of your job. For everyone else — which is almost everyone — the benchmark improvement is a spectator sport masquerading as a purchase decision.

There's a manufactured scarcity layer too. Waitlists for digital goods. "Early access" for products with infinite supply. Limited-time pricing for subscriptions that will be available next month. These are urgency mechanisms borrowed from physical retail and applied to software that has zero marginal cost to distribute. The waitlist doesn't exist because servers are scarce. It exists because waiting creates desire. [VERIFY: Some AI companies have acknowledged using staged rollouts primarily for demand generation rather than infrastructure constraints, though specific admissions are sparse.]

Then there's the rationalization that makes the treadmill invisible to the person on it: "I'm not switching, I'm just testing." Testing takes time. Testing requires context-switching. Testing means your current workflow is paused while you evaluate whether a different workflow might be better. Testing is switching with plausible deniability. The tool collector who "tests" every new release is switching constantly — they just never frame it that way because "I'm a careful evaluator" is a more flattering identity than "I can't commit to a tool for more than three weeks."

The hidden cost is not the subscription fee. It is the switching cost buried inside what looks like a simple upgrade. Even upgrading within the same tool — same company, same product, new version — costs more than the changelog suggests. The UI has changed. Defaults have shifted. Features you relied on have moved or been renamed. Your muscle memory is wrong now. Prompts that worked perfectly produce different results. None of these are bugs, exactly. They are the tax you pay every time you step off one moving walkway and onto another.

The Fix

Set a quarterly review cadence and ignore everything between reviews.

This sounds extreme and it is not. It is the same approach that competent IT departments use for software updates — they don't chase every point release, they evaluate on a schedule and upgrade when the evidence supports it. You are the IT department of your own workflow. Act like it.

Here is what the cadence looks like in practice. Pick a date — say, the first Monday of every quarter. On that day, and only that day, you review what has changed in the tools you use. Read the release notes for the last 90 days. Try the new features that are relevant to your specific work — not the features that got the most Twitter engagement, but the features that address tasks you actually perform. If an upgrade is genuinely warranted — a real capability you need, a real performance improvement on tasks you do — make the switch. If not, you're done until next quarter.

Between reviews, you ignore releases. Unsubscribe from product update emails. Mute the announcement accounts. When someone says "have you tried the new version of X," your answer is "I'll evaluate it at my next quarterly review." This will feel strange the first time you say it. It will feel powerful the second time. You are declining the treadmill. You are choosing depth over novelty. You are telling the release cycle that it does not set your schedule.

When do upgrades actually matter? Three cases. Security patches — if a genuine vulnerability is disclosed, upgrade immediately, obviously. Capability jumps that address a specific, documented limitation in your current workflow — not "it's better at reasoning" (too vague) but "it now supports the file format I need for my actual project" (specific and testable). And price changes that materially affect your budget — if the same capability is now available at half the cost, that's worth evaluating. That's roughly the complete list. Everything else can wait 90 days. If a feature is genuinely important, it will still be there in three months — and the bugs will be fixed, the documentation will be written, and the early-adopter chaos will have settled into something stable.

The treadmill stops when you decide it stops. Not when the companies stop releasing new versions — that will never happen. Not when you find the perfect tool — that doesn't exist. It stops when you stop treating every release announcement as a call to action and start treating it as what it is: marketing. Good marketing. Effective marketing. But marketing nonetheless, serving the company's growth metrics rather than your productivity. The new version will always call. You do not have to answer.

This is part of CustomClanker's Tool Collector series — 14 subscriptions, zero running workflows.

The Upgrade Treadmill — Why The New Version Always Calls

Rza

The Pattern

The Psychology

The Fix

Read more

The YouTube + AI Pipeline

The Weekly Drop

The Tool Collector's Guide to Owning Nothing

Self-Hosting & Tinkering