Tool Use and Function Calling: How Claude Goes From Knowing Things to Doing Things

There's a version of Claude that answers questions. It's knowledgeable, articulate, and fundamentally passive — you ask, it responds, you copy the output and go do something with it. Then there's the version of Claude that takes actions. It checks your calendar, queries a database, sends a message, creates a file, runs a command. The difference between these two versions is tool use, also called function calling. It's the mechanism that turns a language model into something that can interact with external systems. Understanding it, even at a conceptual level, changes how you think about what Claude can do — and it explains why tools like Claude Code and MCP-connected setups feel qualitatively different from plain chat.

What The Docs Say

Per Anthropic's API documentation, tool use lets you define external functions that Claude can invoke during a conversation. You describe each tool with a name, a description in natural language, and a JSON schema for its parameters. When Claude determines that a tool would help answer the user's request, it responds not with text but with a structured tool call — specifying which tool to invoke and what arguments to pass. Your application executes the function, sends the result back to Claude, and Claude incorporates that result into its final response to the user.

The documentation describes this as a multi-step loop. Step one: you send a message to Claude along with your tool definitions. Step two: Claude analyzes the request and decides whether to use a tool. If yes, it returns a response with a tool_use content block containing the tool name and arguments. Step three: your code executes the tool — makes the API call, queries the database, whatever the tool does — and sends the result back as a tool_result message. Step four: Claude uses the result to formulate its response. This loop can repeat — Claude can call multiple tools in sequence, using the output of one to inform the next. Anthropic calls this agentic behavior when the loop runs autonomously.

How It Actually Works

The mechanics are cleaner than you might expect. Here's the essential flow, stripped of boilerplate. You define a tool like this: name it get_weather, describe it as "Gets the current weather for a given location," and define its parameters as a JSON schema with a required location string. You send this definition along with the user's message ("What's the weather in Portland?") to the API. Claude doesn't generate a text response. Instead, it returns a structured object saying "I want to call get_weather with location: Portland, OR." Your code takes that, calls whatever weather API you use, gets the result (65 degrees, partly cloudy), and sends it back. Claude then responds to the user: "It's currently 65 degrees and partly cloudy in Portland."

What makes this interesting is what happens in the gap between "the user asked about weather" and "Claude decided to call the weather tool." Claude read your tool description, understood what the tool does, matched it to the user's intent, and formatted the arguments correctly — all without explicit programming. You didn't write an if-statement that says "if the user mentions weather, call the weather API." Claude figured that out from the description. This is why the quality of your tool descriptions matters enormously. A well-described tool gets called appropriately. A poorly-described one gets called when it shouldn't be, or not called when it should be.

I tested this extensively and Claude's tool selection accuracy is genuinely impressive. With 10-15 well-described tools available, Claude consistently chose the right one and formatted arguments correctly. More impressively, it consistently chose not to call tools when a direct response was better. Ask "what's the capital of France" when a search_web tool is available, and Claude just answers from its knowledge. Ask "what's the current population of France" and it reaches for the search tool, because it knows its training data might be outdated. This discrimination — knowing when to use tools and when not to — is harder than it sounds and is one of Claude's genuine strengths relative to other models.

Where things get more nuanced is with complex tool interactions. Claude can chain tool calls: check the calendar for free time, then create a meeting, then send an email about it. Each step depends on the previous result. In my testing, Claude handles two- and three-step chains reliably. Beyond that, error rates increase. Not because the individual tool calls fail, but because Claude's plan for the full chain becomes less coherent. It might query the calendar, get back availability, then create the meeting at a time that technically conflicts with something it should have noticed. The longer the chain, the more opportunities for these subtle reasoning failures.

The Error Handling Problem

Here's what the docs underemphasize: error handling is your problem, and it matters more than you'd think. When a tool call fails — the API returns an error, the database is down, the parameters were wrong — you need to send that error back to Claude in a way it can work with. The naive approach is to send the raw error message. The better approach is to send a structured error with enough context for Claude to either retry with different parameters or explain the failure to the user. "HTTP 404: Not Found" tells Claude nothing useful. "The user 'jsmith' was not found in the database. Available users can be searched with the search_users tool." tells Claude how to recover.

In practice, about 15-20% of my tool use interactions involved some form of error or unexpected result that required Claude to adapt. Claude handles this reasonably well when the error messages are informative. It handles it poorly when they're opaque. The worst case is a silent failure — the tool returns empty results without indicating whether that means "no results found" or "something went wrong." Claude will confidently interpret empty results as "no results found" almost every time, even when the actual cause was a connection timeout. Building robust tool use means building robust error reporting in your tools. This is not glamorous work and it's not covered in most tutorials.

MCP vs. Raw Tool Definitions

Model Context Protocol — MCP — is Anthropic's standardized way of connecting tools to Claude. Instead of defining tools directly in your API calls, you run MCP servers that expose tools through a standard protocol. Claude Code, Claude.ai's desktop app, and various third-party integrations use MCP to connect to external services.

The practical question is when you need MCP versus when direct tool definitions are enough. Direct definitions are simpler. You define tools in your API call, you handle execution in your code, everything runs in one process. For a specific application with a fixed set of tools — a customer service bot that can look up orders and process returns — direct definitions are the right choice. Less infrastructure, less complexity, fewer moving parts.

MCP makes sense when tools need to be modular, shareable, or maintained independently. If you want Claude Code to access your company's internal APIs, you build an MCP server that exposes those APIs as tools. Now anyone on your team with Claude Code can use them without modifying their Claude Code installation. If you want to connect Claude to multiple services — GitHub, Slack, a database, a calendar — each one can be a separate MCP server. You compose capabilities by connecting servers, not by editing a monolithic tool definition file.

The MCP ecosystem has grown substantially. There are community-built servers for dozens of services, and the protocol is open enough that building your own is a weekend project for a competent developer. But MCP also adds complexity. You're running servers, managing connections, handling authentication for each service. For simple use cases, this is over-engineered. For complex agent setups with many tools and multiple users, it's the right abstraction. The dividing line in my experience is around 5-7 tools. Below that, direct definitions. Above that, or if tools need to be shared across multiple contexts, MCP earns its overhead.

The Agent Loop

Tool use becomes transformative when it's combined with reasoning in an autonomous loop. This is what people mean when they say "agent." The loop works like this: Claude receives a task, thinks about what to do, calls a tool, gets the result, thinks about what the result means, decides whether to call another tool or respond to the user, and continues until the task is done. Claude Code is the most visible example — you say "add error handling to the authentication module," and Claude reads files, understands the code, makes edits across multiple files, runs tests, fixes failures, and reports back. Each step is a tool use interaction. The agent quality comes from the loop, not from any individual tool call.

Where the agent loop breaks is predictability. A direct tool call — "get the weather" — is deterministic in intent even if the result varies. An agent loop — "refactor this module to use dependency injection" — involves Claude making judgment calls at every step. Which files to read, what pattern to follow, whether to change a function signature or add a wrapper. These judgment calls are usually good. But "usually good" means "sometimes wrong," and the longer the loop runs, the more judgment calls accumulate. I've had Claude Code make 15 tool calls to accomplish a task that needed 4, because it went down an investigative path that turned out to be irrelevant. The token cost — and time cost — of these detours is real.

The practical lesson: agent loops are most reliable for well-defined tasks with clear success criteria. "Add a created_at field to the users table and update the API to return it" is a good agent task. "Make the application faster" is a bad one. The more specific the goal, the fewer judgment calls in the loop, the better the result. This isn't a limitation of Claude specifically — it's a limitation of autonomous systems in general. But it's worth understanding because the demo videos always show the clean runs, and the reality includes the messy ones.

For Non-Developers

If you're not building applications with the API, you might wonder why any of this matters to you. It matters because every time Claude does something beyond generating text — when Claude Code edits a file, when Claude.ai searches the web, when an MCP-connected Claude checks your calendar — tool use is the mechanism underneath. Understanding it helps you understand why some things work reliably (simple, well-defined tool calls) and some things are flaky (long chains of dependent actions). It helps you give better instructions — "check my calendar for next Tuesday" is a cleaner tool invocation than "figure out when I'm free sometime next week." And it helps you understand the difference between a Claude that just knows things and a Claude that's connected to the systems where your work actually lives.

The trajectory here is clear. Models that can only generate text are a transitional form. The future is models that can take actions — read data, write data, interact with services, operate software. Tool use is the mechanism that makes this possible. It's not flashy, it's not the feature that makes headlines, and it requires real engineering work to set up properly. But it's the foundation that everything else — agents, MCP, Claude Code, automated workflows — is built on. Understanding it pays compound interest on everything else you learn about Claude.


This article is part of the Claude Deep Cuts series at CustomClanker.