Llm Platforms

Claude: What It Actually Does in 2026

Rza

18 Dec 2025 — 6 min read

Anthropic's Claude is the LLM platform that serious text workers migrated to and mostly stayed on. It is the best model family available for long-form writing, complex instruction following, and code editing. It is not the best at everything, and it costs more than you expect once you start using it seriously.

What It Actually Does

Claude ships in three tiers, and the tiers matter more than Anthropic's marketing suggests. Haiku is the fast, cheap model — you use it for classification, extraction, and anything where you need sub-second latency and don't care about nuance. Sonnet is the daily driver, the model that handles 80% of real work: drafting, summarizing, code generation, analysis. Opus is the reasoning tier — the model you reach for when the task is genuinely hard, when you need the model to hold a complex problem in its head and work through it step by step. Per Anthropic's API pricing page, you're looking at roughly $0.25/$1.25 per million input/output tokens for Haiku, $3/$15 for Sonnet, and $15/$75 for Opus. Those Opus prices add up fast when you're doing extended reasoning passes over long documents.

The tiering isn't just a pricing ladder. Each model has a genuinely different personality in practice. Haiku is terse and fast, good for structured output where you don't need the model to think. Sonnet is the sweet spot — fluent, capable, follows instructions well enough that you can hand it a style guide and get consistent output. Opus is slower and more expensive, but it catches things Sonnet misses. On complex coding tasks, on multi-step reasoning, on anything where the model needs to hold contradictory constraints in its head simultaneously — Opus earns its price. The gap between Sonnet and Opus is smaller than the gap between either of them and GPT-4o on instruction-following tasks, but it's real.

The feature that actually changes how you use Claude is Projects. You create a project, attach a system prompt and reference documents, and every conversation within that project inherits that context. This sounds trivial. It is not. In practice, it means you can build a persistent working environment — a project for "rewrite articles in this voice" with your style guide attached, a project for "review this codebase" with your architecture docs loaded, a project for "analyze these contracts" with your template agreements as reference. The system prompt becomes a reliable control surface. Claude follows system prompts more faithfully than any competitor I've tested, and Projects make that faithfulness compounding rather than one-shot.

Extended thinking — Anthropic's chain-of-thought feature — is genuinely useful in specific scenarios. When you ask Opus to work through a hard math problem, a complex code refactor, or a multi-constraint optimization, extended thinking produces visibly better results. You can watch the model's reasoning in the thinking block, which is useful for debugging when it goes wrong. But extended thinking is not free. It burns tokens — sometimes a lot of tokens — and it slows responses from seconds to tens of seconds. For straightforward tasks, turning on extended thinking is pure overhead. I tested it across a week of mixed tasks: it improved output quality maybe 30% of the time and was irrelevant or counterproductive the rest. The key is knowing when to reach for it, which means knowing your task well enough to predict whether the model needs to reason or just generate.

The 200K token context window is real but misleading. You can technically load 200K tokens into a conversation. According to Anthropic's documentation, the model can process and reason over all of it. In practice, what happens is more nuanced. Performance on recall tasks degrades well before you hit the limit — the "lost in the middle" problem that affects all transformer-based models. I tested Claude with documents at 50K, 100K, and 150K tokens and asked it to find specific details buried at various positions. At 50K, it was reliable. At 100K, it started missing things in the middle third of the document. At 150K, you need to be strategic about where you put the information you care about — front and back get more attention. This isn't a Claude-specific problem; every model does this. But the marketing makes you think 200K tokens means 200K tokens of equally weighted attention, and it doesn't.

Where Claude genuinely pulls ahead of GPT: writing quality, instruction following, and code editing. Claude produces prose that sounds like a human wrote it, not like a language model trying to sound like a human wrote it. The distinction is subtle but real — fewer filler phrases, more natural paragraph transitions, better sense of when to stop. On instruction following, Claude will actually read your twelve-bullet style guide and follow all twelve bullets. GPT tends to follow six of them well and quietly ignore the rest. On code editing, Claude (especially through Claude Code) treats your codebase as a system rather than a collection of files, which makes multi-file refactors significantly more reliable.

Where GPT still wins: multimodal capabilities — particularly voice and vision — ecosystem breadth, and the sheer size of the third-party integration market. Claude's vision is competent but GPT-4o's is better, particularly for complex images, charts, and handwritten text. Claude has no native voice mode comparable to GPT-4o's real-time voice. And if you need your LLM to plug into Zapier, Make, or fifty other automation tools, GPT has more pre-built connectors. The plugin and GPT Store ecosystem, for all its messiness, gives GPT a distribution advantage that Claude's cleaner but smaller ecosystem can't match yet.

What The Demo Makes You Think

Anthropic's demos emphasize Claude's reasoning and writing quality. They show Opus solving a hard problem, or Claude producing a beautifully structured analysis, or extended thinking working through a complex task. What the demos don't show is the daily reality of managing costs, choosing the right tier for each task, and dealing with the limitations that every model has.

The fiddling trap with Claude is system prompt optimization. Because Claude follows system prompts so well, you start believing that the perfect system prompt will produce perfect output. So you spend hours refining your prompts, testing variations, adding constraints. This works — up to a point. But the returns diminish fast, and you can easily burn more time on prompt engineering than you'd save over months of usage. The practical move is to get your system prompt to "good enough" — usually takes 2-3 iterations — and then stop.

The other trap is tier selection paralysis. Opus is better than Sonnet. Sonnet is better than Haiku. But the cost differences are significant enough that using Opus for everything will run up a bill that makes your finance team nervous. I've seen teams default to Opus "because it's the best" and then get surprised by a $500/month API bill for tasks that Sonnet would have handled identically. The right approach is to default to Sonnet, escalate to Opus when you hit a task that Sonnet clearly struggles with, and use Haiku for anything that's fundamentally a classification or extraction problem.

A month of serious Claude usage — meaning daily use of the Pro plan at $20/month for the web interface, plus moderate API usage for automation — runs somewhere between $50 and $200/month for an individual. For a small team using the API for production workloads, expect $500-2,000/month depending on volume and tier mix. These are real numbers from real usage, not theoretical calculations from the pricing page.

What's Coming (And Whether To Wait)

Anthropic has been on a consistent cadence of model improvements. Sonnet has gotten meaningfully better with each version, and the gap between Sonnet and Opus has narrowed over time. The reasonable expectation is that by mid-2026, the next Sonnet will handle tasks that currently require Opus, and the next Opus will push into territory that's currently unreliable for any model.

The features to watch are tool use and computer use. Anthropic has been investing in agentic capabilities — Claude's ability to use tools, browse the web, interact with APIs, and operate computer interfaces. These features are in various stages of beta and production readiness. If agentic AI is part of your roadmap, Claude is a reasonable bet — Anthropic is clearly prioritizing it, and Claude Code is already the best coding agent available. But "agentic AI" is also the hype cycle's current obsession, so calibrate your expectations accordingly.

Should you wait? No. Claude is useful now. The improvements are incremental and additive — you won't lose your investment in learning the platform, and the skills you build (prompt engineering, system prompt design, tier selection) transfer across model versions. If you're evaluating Claude vs. GPT vs. Gemini for the first time, the honest answer is: try all three for a week each on your actual workload. But if your work is primarily text — writing, analysis, code — Claude is where I'd start.

The Verdict

Claude earns a slot as the primary LLM for anyone whose work is text-heavy. Writers, developers, analysts, researchers — if your output is words or code, Claude produces better results with less prompt wrangling than the alternatives. The tiering system works once you learn to use it, Projects make it a genuine working environment rather than a chat toy, and the instruction following is best in class.

It does not earn a slot if your primary need is multimodal (use GPT-4o), if you need the cheapest possible inference at scale (use Gemini Flash or Llama), or if you need deep integration with Google's productivity suite (use Gemini). Every tool has a shape, and Claude's shape is text in, better text out. Know what you need, and pick accordingly.

Updated March 2026. This article is part of the LLM Platforms series at CustomClanker.

Claude: What It Actually Does in 2026

Rza

What It Actually Does

What The Demo Makes You Think

What's Coming (And Whether To Wait)

The Verdict

Read more

The YouTube + AI Pipeline

The Weekly Drop

The Tool Collector's Guide to Owning Nothing

Self-Hosting & Tinkering