Agents

The Agent Hype Cycle: Where We Actually Are

Rza

18 Jul 2025 — 6 min read

In March 2023, AutoGPT hit GitHub and broke the star counter. The pitch was irresistible: give GPT-4 a goal, let it break the goal into tasks, let it execute those tasks, and watch it work autonomously. People genuinely believed general-purpose AI agents were weeks away from replacing knowledge workers. Three years later, the general-purpose autonomous agent still doesn't exist. What exists instead is more useful and less exciting — narrow agents that do specific jobs reliably, embedded in products you already use. The hype cycle has done what hype cycles do, and knowing where we are on the curve is the difference between making good bets and making expensive ones.

What Actually Happened (The Timeline)

March 2023: AutoGPT. Toran Bruce Richards released a Python script that gave GPT-4 a recursive planning loop with memory and internet access. It hit 100K GitHub stars faster than any project in history. The demos were spectacular — the agent researching topics, writing files, browsing the web, iterating on its own output. The reality was infinite loops, cost explosions, and goal drift within minutes. But the vision — an AI that decomposes goals and executes them autonomously — captured something real. Every agent framework that followed is a direct descendant of the problem AutoGPT tried to solve.

April 2023: BabyAGI. Yohei Nakajima stripped the concept down to its essentials — a task creation agent, a task execution agent, and a task prioritization agent sharing a vector database. It was simpler and more instructive than AutoGPT, and it laid bare the fundamental architecture: plan, execute, evaluate, replan. BabyAGI proved the concept could work on toy problems. It also proved that "works on toy problems" and "works on real problems" are separated by an engineering chasm.

Late 2023: The frameworks arrive. CrewAI, LangGraph, AutoGen — each one took the agent loop and wrapped it in a different abstraction. CrewAI said agents work better in teams with roles. LangGraph said agents are state machines. AutoGen said agents are participants in structured conversations. The diversity was healthy. The marketing was not — every framework implied its abstraction was the one that would make agents reliable. None of them did, because reliability was a model problem, not a framework problem.

Early 2024: Devin. Cognition Labs released a demo showing an AI "software engineer" that could take a GitHub issue and produce a pull request autonomously. It had a sandboxed environment with a browser, terminal, and code editor. The launch video got 40 million views. The backlash was swift — people found evidence that the demo was misleading, that the tasks were simpler than presented, that human intervention was edited out. But Devin moved the conversation from "can agents do things" to "can agents do things well enough to justify their cost." That was progress, even if the answer was "not yet."

Late 2024: Coding agents mature. Claude Code launched and — quietly, without a viral demo — became the first agent that developers actually used daily. Not because it was autonomous, but because it was reliable enough on scoped tasks. GitHub Copilot added agent features. Cursor added agent features. The pattern was clear: agents worked when embedded in existing tools with clear scope, not when deployed as standalone autonomous systems.

2025: OpenAI Agents SDK, the platform play. OpenAI released an SDK for building agents on its models — tool use, handoffs, guardrails, tracing. This marked the shift from "agent as product" to "agent as infrastructure." Building an agent went from a research project to a library import. The barrier to entry dropped, and the average quality of agent deployments dropped with it, because more people could now build agents but the reliability problems remained unsolved.

2026: Where we are now. The general-purpose autonomous agent is dead. Not "dead" as in abandoned — "dead" as in the concept has been refined out of existence. Nobody serious is building an agent that can "do anything." The surviving products do one thing or a few things, scoped tightly, with human oversight. The interesting work is in making these narrow agents more reliable, cheaper to run, and easier to build.

What Died

Fully autonomous general-purpose agents. The AutoGPT vision — give it a goal, walk away, come back to results — does not work for non-trivial tasks with current technology. The failure modes (hallucinated actions, goal drift, cost spirals, error cascades) are well-documented and fundamental. They stem from the fact that LLMs are not reliable enough for long unsupervised chains of consequential decisions. This will improve. It hasn't improved enough.

"AGI in a loop" fantasies. The idea that wrapping GPT-4 in a while loop with internet access would produce emergent general intelligence was always a category error. Recursive prompting amplifies both the model's capabilities and its failure modes. The capabilities plateau while the failure modes compound. The math doesn't work, and three years of evidence confirms it doesn't work.

Agents that replace entire job functions. "AI will replace software engineers / analysts / writers / customer support" was the 2023-2024 forecast. What actually happened: AI made parts of those jobs faster while making other parts — the review, verification, and correction of AI output — newly necessary. The net effect on most job functions has been productivity improvement, not replacement. The exceptions are narrow: tasks that were already well-defined, repetitive, and verifiable are genuinely automated. Everything else got a copilot.

What Survived

Narrowly scoped task agents. Agents that classify support tickets. Agents that generate and run tests. Agents that extract structured data from unstructured documents. Agents that triage pull requests. The common thread: clear inputs, clear outputs, measurable correctness, limited scope. These agents work in production. They save real time and money. They just don't make for exciting demos.

Coding assistants with agentic features. Claude Code, Cursor, GitHub Copilot — tools that can execute multi-step tasks within a development environment. They sit at the boundary between copilot and agent: autonomous enough to run a test loop, supervised enough that a developer reviews every meaningful change. This turned out to be the sweet spot — enough autonomy to save time, enough oversight to prevent disasters.

Structured workflow automation with AI components. n8n, Zapier, Make — automation platforms that now include LLM nodes for classification, extraction, summarization, and routing. The automation handles the deterministic parts. The LLM handles the parts that need language understanding. The combination is more robust than a pure agent because most of the pipeline is deterministic and only the ambiguous parts use the non-deterministic model.

Where We Are on the Curve

If you accept the Gartner hype cycle as a useful — if imperfect — mental model, here's the placement:

General-purpose agents: Deep in the trough of disillusionment. The early enthusiasm has fully deflated. Most people who tried to build general agents between 2023 and 2025 have either pivoted to narrow agents or moved on entirely. The concept isn't dead — it's dormant, waiting for models that are reliable enough to justify it. That might be two years away or ten.

Narrow task agents: On the slope of enlightenment. The pattern is established, the tooling is maturing, and real deployments are producing real value. We're past the point where "agents in production" is novel and moving toward the point where it's expected. The remaining challenges are engineering challenges — monitoring, evaluation, cost management — not existential ones.

Agentic features in existing tools: Approaching the plateau of productivity. Code editors, automation platforms, CRM systems, support tools — the embedding of agent-like capabilities in products people already use is well underway and delivering value at scale. This is the most boring and most useful outcome.

What's Coming (12-24 Months)

Better tool use. Models are getting meaningfully better at selecting the right tool, calling it with correct parameters, and interpreting the results. This directly reduces the hallucinated tool call failure mode, which is one of the top reliability killers. Each percentage point of improvement in tool use accuracy makes agents viable for a wider range of tasks.

Longer reliable chains. Not just larger context windows — better utilization of those windows. The gap between "advertised context length" and "reliable operating context length" is closing. Anthropic's, OpenAI's, and Google's latest models all show improved performance on long-context tasks compared to a year ago. This means agents can handle more complex tasks before they start drifting.

Cheaper inference. The cost per token continues to fall. This matters for agents specifically because agents use tokens at a much higher rate than chat interactions — every loop iteration, every tool call result, every retry burns tokens. Tasks that were uneconomical at 2024 pricing are viable at 2026 pricing and will be cheap at 2027 pricing. Economic viability expands the set of tasks where agents beat human labor on cost.

Agent infrastructure as a commodity. Observability, evaluation, guardrails, cost management — the infrastructure layer for running agents in production is consolidating into products and libraries. Building a production-grade agent in 2024 meant building most of this infrastructure yourself. In 2026, you can buy or import it. In 2027, it'll be boring standard practice.

The prediction nobody wants to hear. Agents will be plumbing. Not magic, not revolutionary, not transformative — plumbing. Like databases, like APIs, like CI/CD pipelines. Essential infrastructure that does specific jobs reliably and that nobody writes breathless blog posts about. The "AI agent" as a product category will dissolve into "software that uses LLMs to automate specific tasks." That's not a failure. That's technology maturing.

The Verdict

We're past the hype and into the real work. The exciting phase — where every week brought a new framework that promised autonomous AI — is over. The useful phase — where teams figure out which tasks actually benefit from agent automation and build the infrastructure to run them reliably — is underway. It's less fun to tweet about and more valuable to deploy.

The honest summary: if you're betting on general-purpose autonomous agents, you're early by years and possibly by a paradigm shift. If you're betting on narrow agents for specific tasks with clear evaluation criteria, you're on time. If you're betting on agentic features embedded in tools you already use, you're already late — your competitors are using them. The hype cycle's lesson is the same as always: the technology works, just not the way the hype said it would, and it takes longer to get there, and when it arrives it looks more boring than anyone predicted.

This is part of CustomClanker's AI Agents series — reality checks on every major agent framework.

The Agent Hype Cycle: Where We Actually Are

Rza

What Actually Happened (The Timeline)

What Died

What Survived

Where We Are on the Curve

What's Coming (12-24 Months)

The Verdict

Read more

The YouTube + AI Pipeline

The Weekly Drop

The Tool Collector's Guide to Owning Nothing

Self-Hosting & Tinkering