Agents

Building Your Own Agent: Frameworks vs. Raw API Calls

Rza

18 Jul 2025 — 7 min read

You've decided to build an agent. The first fork in the road is whether you reach for a framework — CrewAI, LangGraph, AutoGen, OpenAI Agents SDK — or just write code against the model API directly. The framework demos look productive. The raw API approach looks like reinventing the wheel. Neither impression is quite right, and the wrong choice at this fork costs you weeks, not hours.

What It Actually Does (Both Ways)

Let's start with what the raw API approach actually looks like, because most people overestimate its complexity.

A minimal agent is a while loop. You send a message to the model API with a list of available tools. The model either responds with text (it's done) or responds with a tool call (it wants to do something). You execute the tool call, send the result back, and loop. That's it. In Python, using any major model's API, this is 30-50 lines of code. Not 30-50 lines of clever, compressed code — 30-50 lines of straightforward, readable code that anyone with basic programming skills can understand and modify.

Here's what those 50 lines get you: a working agent loop with tool use, multi-step execution, and the ability to handle any tool you can define as a function. What they don't get you: state persistence across sessions, human-in-the-loop checkpoints, parallel tool execution, observability, retry logic, or multi-agent coordination. You can build all of those yourself. The question is whether you should.

Now the framework side. Frameworks provide the infrastructure that the raw loop doesn't: state management, tool integration boilerplate, observability hooks, error handling patterns, and — in the case of multi-agent frameworks — coordination between multiple agents. LangGraph gives you a state machine with nodes and edges. CrewAI gives you agents with roles and tasks. AutoGen gives you conversable agents with dialogue patterns. OpenAI's Agents SDK gives you an opinionated agent loop with handoffs and guardrails.

Each framework makes a bet about what abstraction you need. LangGraph bets you need a graph. CrewAI bets you need role-based delegation. AutoGen bets you need structured conversation. The bet matters because it shapes what's easy and what's hard. If your agent fits the framework's mental model, the framework saves time. If it doesn't, you spend more time fighting the framework than you would have spent writing the code from scratch.

What The Demo Makes You Think

The framework demos are seductive. Five minutes of setup, a clean notebook, and you've got a multi-agent system where a "researcher" agent finds information, a "writer" agent drafts content, and a "critic" agent reviews it. The demo runs, the agents pass messages to each other, and the output looks polished. You think: this is going to save me weeks.

Here's what the demo doesn't show you.

It doesn't show you customizing the behavior when the framework's default doesn't match your requirements. You want the researcher to use a specific API with specific authentication. The framework's tool integration expects a different interface. You're now writing an adapter layer between your tool and the framework's tool abstraction — code that wouldn't exist if you'd just called the API yourself. The adapter is small for simple cases and maddening for complex ones.

It doesn't show you debugging through the framework's abstractions. When your agent produces wrong output, you need to figure out where the chain broke. In a raw loop, you add a print statement and see exactly what the model received and what it returned. In a framework, the data flows through state managers, callback handlers, and message routers. The framework might have observability tools (LangSmith for LangGraph, for example) that help with this. Or it might not, and you're reading framework source code to figure out why your tool call result got transformed somewhere between execution and the next model prompt.

It doesn't show you the version churn. LangChain is the canonical example — its API surface changed enough between versions that tutorials from six months ago might not run — but it's not unique. Agent frameworks are evolving fast because the space is evolving fast. A framework that locked in architectural decisions a year ago might have made choices that don't align with current model capabilities. You inherit those choices and their constraints.

And it doesn't show you the production customization phase. The framework got your prototype running in an afternoon. Now you need to add: custom error handling for your specific tool failures, cost tracking per agent run, rate limiting against the model API, a specific output format that doesn't match the framework's default, and integration with your existing logging infrastructure. Each of these is possible within the framework. Each of these requires understanding the framework's extension points, which are documented with varying degrees of thoroughness. The sum total of these customizations often exceeds the code you would have written without the framework.

The Decision Framework

Here's when each approach wins. Not "it depends" — specific conditions.

Use the raw API when:

Your agent does one thing in a loop. It calls tools, processes results, and produces output. The loop is linear or has simple branching. You don't need state persistence across sessions. You don't need multiple agents coordinating. You want to understand every line of code in your system. This covers more use cases than you'd think — most useful agents are simpler than frameworks assume.

You need maximum control over the model interaction. You want to manage the prompt exactly, control token usage precisely, implement custom retry logic, or switch between model providers without changing your agent architecture. Frameworks add abstraction between you and the API call. Sometimes that abstraction helps. Sometimes it's a wall between you and the thing you need to control.

You're building for production from day one. Frameworks optimize for prototype speed. Production code optimizes for debuggability, reliability, and maintainability. These are different goals, and the framework's choices for the first goal can actively hinder the second.

Use a framework when:

You need a state machine. If your agent has complex branching logic — "if the tool returns X, go to step A; if it returns Y, go to step B; if it returns an error, retry twice then escalate" — and you need this state to be persistent, checkpointed, and recoverable, then you're building a state machine. LangGraph is a state machine framework. Writing your own state machine is possible and often unpleasant. This is the strongest case for a framework.

You need human-in-the-loop at specific decision points. Pausing agent execution, presenting the current state to a human, waiting for approval, and resuming — this is genuinely tricky to build from scratch and well-supported by frameworks like LangGraph and AutoGen. If your agent needs human checkpoints, the framework saves meaningful engineering effort.

You need multi-agent coordination. If your problem genuinely requires multiple specialized agents passing information between each other — not because it sounds cool, but because the task decomposition actually benefits from it — frameworks provide the coordination plumbing. Note that most problems don't actually need multiple agents. A single agent with multiple tools usually outperforms a crew of agents with divided responsibilities, because you're eliminating the coordination overhead and message-passing ambiguity.

You're prototyping and expect to throw it away. If the goal is to validate whether an agent can do the job at all — before committing to production architecture — a framework gets you to the answer faster. The key is acknowledging that the prototype is disposable. The problems start when the prototype becomes the production system.

The Hybrid Path

Most experienced teams end up somewhere in the middle, and it's worth naming this path explicitly.

The pattern: use a framework for orchestration — state management, checkpointing, human-in-the-loop flow — but bypass the framework for actual model interaction. Make your own API calls, manage your own prompts, handle your own tool execution. Let the framework handle the graph structure and state transitions. This gives you the framework's infrastructure benefits without the framework's opinions about how you should talk to the model.

LangGraph supports this pattern relatively well — you can put arbitrary code in graph nodes, including raw API calls. CrewAI is more opinionated and harder to bypass for the model layer. OpenAI's Agents SDK is tightly coupled to OpenAI's models by design. Your mileage will vary by framework.

The other hybrid: start raw, extract patterns into a framework later. Build your agent with direct API calls. When you find yourself implementing state persistence, retry logic, and checkpoint recovery for the third time, you've identified the parts where a framework would help. Adopt the framework for those specific concerns, not for the whole system. This is more work up front and less regret later.

What's Coming

The framework landscape is consolidating. A year ago there were dozens of agent frameworks competing for attention. In 2026, the serious contenders are LangGraph, CrewAI, AutoGen, and OpenAI's Agents SDK, with everything else either absorbed, abandoned, or niche [VERIFY]. This consolidation is healthy — it means the surviving frameworks are battle-tested and their abstractions are stabilizing.

Model providers are also building more agent primitives into their APIs directly. Tool use, structured output, system prompts with behavioral constraints — features that required framework code a year ago are now API parameters. The more the API provides natively, the less the framework needs to abstract, and the simpler the raw API approach becomes.

The trajectory points toward thinner frameworks. Not "no frameworks," but frameworks that do less — that handle orchestration and state without trying to own the entire stack from prompt to output. The thick frameworks that wrap everything in proprietary abstractions are the ones losing mindshare. The thin ones that let you bring your own model calls are the ones growing.

The Verdict

The framework vs. raw API decision is not about technical sophistication. Raw API is not "doing it the hard way." Frameworks are not "doing it the smart way." They're different tradeoffs for different situations, and the wrong choice in either direction costs real time.

If your agent is a loop with tools, start raw. You'll understand your system completely, debug it easily, and add complexity only when you need it. If your agent is a state machine with human checkpoints and multi-step branching, use a framework — you'll be grateful for the state management on day two.

The honest summary: most agents are simpler than the framework ecosystem implies. A while loop and some tool definitions get you surprisingly far. Frameworks earn their complexity when your agent needs infrastructure — state, persistence, coordination, checkpoints — that you don't want to build yourself. The mistake is reaching for the framework before you've confirmed you need what it provides.

This is part of CustomClanker's AI Agents series — reality checks on every major agent framework.

Building Your Own Agent: Frameworks vs. Raw API Calls

Rza

What It Actually Does (Both Ways)

What The Demo Makes You Think

The Decision Framework

The Hybrid Path

What's Coming

The Verdict

Read more

The YouTube + AI Pipeline

The Weekly Drop

The Tool Collector's Guide to Owning Nothing

Self-Hosting & Tinkering