Agents

Claude Code as Agent: What Anthropic Actually Shipped

Rza

17 Jul 2025 — 6 min read

Claude Code started as a coding assistant. Somewhere along the way — through iterative model upgrades, MCP integrations, and a plan-execute-verify loop that actually works more often than not — it became the closest thing to a reliable coding agent anyone has shipped. The marketing doesn't say "agent" in neon letters. The behavior does. And the gap between what it does and what people assume it does is the whole story.

What It Actually Does

Claude Code runs an agentic loop. That's a specific claim, so let me be specific about what it means.

When you give Claude Code a task — "add rate limiting to this API" — it doesn't just generate a code block. It reads your codebase to understand how your API is structured. It identifies which files need changes. It makes those changes across multiple files. It runs your tests. If the tests fail, it reads the error output, diagnoses the problem, edits its own work, and runs the tests again. This loop — plan, execute, observe, correct — is what separates an agent from an autocomplete engine. Claude Code does it well enough that you can hand it a real task and come back to a working diff.

The tool use layer is where the agent framing earns its keep. Claude Code reads files, writes files, runs terminal commands, manages git operations, and connects to external services through MCP. It's not calling an API and returning text. It's operating inside your development environment with actual filesystem access and the ability to execute the results of its own reasoning. When it writes a function, it can immediately run the tests that prove the function works. When it creates a branch, it can commit to it. This is substantively different from pasting code into a chat window.

The planning step deserves separate mention. With extended thinking enabled, Claude Code reasons through complex tasks before acting — mapping dependencies, considering edge cases, ordering operations. The thinking isn't always visible (or always correct), but it's the difference between an agent that dives in and one that looks before it leaps. On multi-step refactors, the planning phase meaningfully reduces the rate of cascading errors where one wrong early decision poisons everything downstream.

Where it genuinely saves hours: navigating unfamiliar codebases (it greps, traces, and builds a working model faster than you can), writing test suites for existing code, generating boilerplate that follows your project's patterns, and one-shot refactors that touch many files but follow a consistent pattern. These aren't party tricks. They're daily time savings on the order of 1-3 hours for a working developer.

What The Demo Makes You Think

The demos show Claude Code building full applications from natural language descriptions. Terminal scrolling, files appearing, tests passing — the whole sequence compressed into a two-minute clip. It looks like the tool does the work and the human just describes what they want.

Here's what gets left out.

The demos almost never show iteration on ambiguous requirements. "Build me an API" is a clear task. "Make this API handle the edge case where the user's subscription lapsed but they have a grace period that depends on their plan tier" is a real task. Claude Code handles the first kind brilliantly. On the second kind, it makes assumptions. Sometimes the assumptions are reasonable. Sometimes they're confidently wrong in ways that pass tests because the tests were also wrong in the same direction.

The demos don't show what happens when the context window fills up. Claude Code with a fresh context is a different tool than Claude Code 90 minutes into a complex session. Early in a session, it remembers every constraint, every file it's read, every decision it's made. Late in a session, it starts contradicting itself — reimplementing something it already built, forgetting a constraint you stated explicitly, or losing track of the architectural pattern it was following three steps ago. The 200K token context window sounds enormous until you're working in a real codebase with dozens of relevant files and a chain of dependent edits. It fills up faster than you'd expect, and degradation is gradual enough that you don't always notice it happening.

They also skip the cost math. Claude Code on API billing runs through tokens fast. A heavy day of agent usage — the kind where you're using it as a genuine development partner, not just asking quick questions — can cost $20-50 on Sonnet, more on Opus [VERIFY]. The Max plan with its included usage changes this equation, but if you're on the API tier, you need to know what you're spending. The demos show the output. They don't show the invoice.

And the demos never show the cases where you need to stop it. Claude Code sometimes commits to an approach that's heading in the wrong direction, and the further it goes, the more work it generates that needs to be unwound. Knowing when to interrupt and redirect is a skill the demos don't teach because the demos never show the tool being wrong.

What's Coming

Anthropic iterates fast. The model improvements from Sonnet 3.5 through Sonnet 4 to the current generation have made Claude Code meaningfully better at multi-step reasoning, tool use reliability, and self-correction. Each model upgrade makes the agent loop tighter.

The areas still being actively developed: better long-session memory (the context degradation problem is partly architectural and won't be fully solved by one release), more reliable planning for tasks that require many dependent steps, and broader MCP integrations that let the agent interact with more of your development infrastructure. The headless mode — where you give Claude Code a task and it executes without interaction — is improving but still requires careful scoping to avoid drift on complex work.

IDE integration is the other frontier. Claude Code is terminal-native, which is a feature for developers who live in the terminal and a barrier for everyone else. Deeper integration with VS Code and other editors is coming, and it matters because the target audience for a coding agent isn't only people who prefer the command line.

Should you wait for these improvements? No. Claude Code is useful now in a way that meaningfully changes how fast you ship code. The improvements will make it better. They won't change whether it's worth using.

How It Compares

Claude Code operates in a different category than Cursor or GitHub Copilot, though the marketing overlap makes this confusing. Copilot is fundamentally a completer — it predicts what you're about to type and offers to type it for you. Cursor is a hybrid — it has agentic features (multi-file edits, command execution) layered on top of an IDE experience. Claude Code is agent-first. It reads the whole context, plans, acts, and verifies.

The practical difference: Copilot saves you keystrokes. Cursor saves you context switches. Claude Code saves you tasks. These are different value propositions, and the right tool depends on how you work. If you want AI embedded in your editor that makes you faster at writing code, Cursor is probably the better fit. If you want an agent that can take a task description and produce a working implementation across multiple files while you do something else, Claude Code is the tool.

Against Devin (covered separately in this series), Claude Code wins on speed, cost, and integration with your existing workflow. Devin wins on autonomy for well-scoped tickets and browser-based tasks. They're different tools solving adjacent problems.

The Verdict

Claude Code is the best coding agent available as of early 2026. That's a meaningful claim and a limited one. "Best coding agent" still means you're reviewing its work, managing its context window, and making the architectural decisions it can't. It is a force multiplier, not a replacement.

It earns a slot if you write code professionally. The time savings are real and measurable — not "10x developer" nonsense, but consistent hours per day on mechanical work that previously required your full attention. The agent loop (plan, execute, verify, correct) works well enough for most standard development tasks that the question isn't whether to use it but how to use it effectively.

It does not earn a slot if you don't code, if you need fully autonomous development, or if you need an AI that can make product decisions. Claude Code is a technical tool for technical people. The agent capabilities make it more than an autocomplete engine — but it's not a replacement for the judgment, experience, and domain knowledge that make a developer valuable.

The honest framing: Claude Code does the mechanical 70% of development work at 90% quality. The remaining 30% — architecture, edge case reasoning, knowing when something is subtly wrong — is still yours. That 70% is a lot of hours. The 30% is why you still have a job.

This is part of CustomClanker's AI Agents series — reality checks on every major agent framework.

Claude Code as Agent: What Anthropic Actually Shipped

Rza

What It Actually Does

What The Demo Makes You Think

What's Coming

How It Compares

The Verdict

Read more

The YouTube + AI Pipeline

The Weekly Drop

The Tool Collector's Guide to Owning Nothing

Self-Hosting & Tinkering