Tool Calling With Open-Source LLMs: A Complete Guide
Tool calling (also called function calling) lets an LLM signal that it wants to invoke a function, rather than just returning text. The model does not execute anything itself -- it outputs a structured request, and your application code runs the function and feeds the result back. This back-and-forth is what makes it possible to build assistants that can look up data, call APIs, or take actions in the world.
This guide covers how tool calling works mechanically, which open-source models support it, and complete Python and Node.js examples you can drop into your own project.
How Tool Calling Works
The flow is straightforward but worth spelling out:
- You define one or more tools as JSON Schema objects describing the function name, parameters, and types.
- You send those definitions alongside the conversation messages to the LLM.
- The model either responds with text (if it can answer directly) or responds with a
tool_callsobject naming the function and the arguments it wants to pass. - Your application runs the function, gets a result, and adds a
toolrole message with that result back into the conversation. - You call the LLM again with the updated message list. It reads the tool result and produces a final response.
The model never calls your code. It only produces a structured JSON payload describing what it wants called and with what arguments. Your application is in full control of whether and how to execute it.
The Message Structure
A conversation with tool calls looks like this:
system: You are a helpful assistant.
user: What's the weather in San Francisco?
assistant: [tool_calls: get_weather(location="San Francisco")]
tool: {"temperature": 62, "condition": "fog"}
assistant: It's 62°F and foggy in San Francisco right now.
The tool_calls response from the assistant contains a JSON object. The tool message contains whatever your code returned from actually running that function. This full history is sent on the next request so the model has context for its final reply.
Models That Support Tool Calling
Tool calling requires models that were specifically trained to produce structured tool_calls responses. Not every open-source model supports this -- it depends on whether the model was fine-tuned for function calling and whether the inference server supports parsing those outputs.
Models available on GeneralCompute that support tool calling:
| Model | Notes | |---|---| | Llama 4 Maverick | Strong tool calling, handles parallel and sequential tool use well | | Llama 4 Scout | Faster, lower cost, good for simpler tool use | | Qwen3-Coder | Excellent for code-adjacent tool calls, returns well-typed arguments | | DeepSeek V3 | Strong reasoning about when to call tools vs. answer directly |
All of these work through GeneralCompute's OpenAI-compatible API, so you use the same tools and tool_choice parameters you would with the OpenAI SDK.
Python Example
Install the OpenAI SDK if you haven't:
pip install openai
Defining Tools
Tools are described as JSON Schema objects. Here's a simple weather tool:
import json from openai import OpenAI client = OpenAI( api_key="your_generalcompute_api_key", base_url="https://api.generalcompute.com/v1", ) tools = [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City name, e.g. 'San Francisco'", }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit. Defaults to fahrenheit.", }, }, "required": ["location"], }, }, } ]
Running the Tool Loop
def get_weather(location: str, unit: str = "fahrenheit") -> dict: # In a real app this would call a weather API return {"temperature": 62, "unit": unit, "condition": "fog", "location": location} def run_with_tools(user_message: str) -> str: messages = [{"role": "user", "content": user_message}] while True: response = client.chat.completions.create( model="llama-4-maverick", messages=messages, tools=tools, tool_choice="auto", ) message = response.choices[0].message # If no tool call, we're done if not message.tool_calls: return message.content # Add the assistant's tool_calls message to history messages.append(message) # Execute each tool call and add results for tool_call in message.tool_calls: func_name = tool_call.function.name func_args = json.loads(tool_call.function.arguments) if func_name == "get_weather": result = get_weather(**func_args) else: result = {"error": f"Unknown function: {func_name}"} messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result), }) # Loop back to call the model again with tool results result = run_with_tools("What's the weather in San Francisco?") print(result)
A few things to note:
tool_choice="auto"lets the model decide when to call tools. You can also set it to"required"to force a tool call, or pass{"type": "function", "function": {"name": "get_weather"}}to force a specific function.- The
tool_call_idon the tool result message must match theidfrom the correspondingtool_callobject. This is how the model correlates results with requests when multiple tools fire in parallel. - The
while Trueloop handles the case where the model calls multiple tools in sequence -- it keeps going until it returns a plain text response with no tool calls.
Parallel Tool Calls
A well-trained model will sometimes issue multiple tool calls in a single response when it determines the calls are independent. For example, if you ask "What's the weather in SF and NYC?", the model might request both in one shot rather than asking for them one at a time:
# Response might contain two tool_calls for tool_call in message.tool_calls: # tool_call.function.name == "get_weather" # tool_call.function.arguments == '{"location": "San Francisco"}' # tool_call.function.arguments == '{"location": "New York City"}' ...
The loop above handles this automatically since it iterates over all tool_calls in the response.
Node.js Example
npm install openai
import OpenAI from "openai"; const client = new OpenAI({ apiKey: "your_generalcompute_api_key", baseURL: "https://api.generalcompute.com/v1", }); const tools = [ { type: "function", function: { name: "get_weather", description: "Get the current weather for a location.", parameters: { type: "object", properties: { location: { type: "string", description: "City name, e.g. 'San Francisco'", }, unit: { type: "string", enum: ["celsius", "fahrenheit"], }, }, required: ["location"], }, }, }, ]; function getWeather(location, unit = "fahrenheit") { // Real implementation would call a weather API return { temperature: 62, unit, condition: "fog", location }; } async function runWithTools(userMessage) { const messages = [{ role: "user", content: userMessage }]; while (true) { const response = await client.chat.completions.create({ model: "llama-4-maverick", messages, tools, tool_choice: "auto", }); const message = response.choices[0].message; if (!message.tool_calls || message.tool_calls.length === 0) { return message.content; } messages.push(message); for (const toolCall of message.tool_calls) { const funcName = toolCall.function.name; const funcArgs = JSON.parse(toolCall.function.arguments); let result; if (funcName === "get_weather") { result = getWeather(funcArgs.location, funcArgs.unit); } else { result = { error: `Unknown function: ${funcName}` }; } messages.push({ role: "tool", tool_call_id: toolCall.id, content: JSON.stringify(result), }); } } } const answer = await runWithTools("What's the weather in San Francisco?"); console.log(answer);
The structure is identical to the Python version. The while loop runs until the model stops requesting tool calls and returns a plain response.
Writing Better Tool Descriptions
The description fields in your tool schema matter more than they might seem. The model decides whether to call a tool, and which arguments to pass, based almost entirely on the description text. Vague descriptions lead to missed calls or wrong arguments.
A few patterns that work well:
Be specific about what the function returns, not just what it does:
"description": "Fetch the current temperature and weather conditions for a city. Returns temperature in the requested unit, a short condition string (e.g. 'sunny', 'fog'), and humidity percentage."
List units and formats explicitly for numeric parameters:
"description": "Unix timestamp in seconds. Use the start of the day if no specific time is given."
Describe edge cases the model should know about:
"description": "Look up a user by email address. Returns null if no account exists. Do not call this for phone numbers."
Bad descriptions produce bad function calls. If you're seeing the model call with wrong argument types or skip tool calls it should be making, the description is usually the first thing to fix.
Handling Tool Errors
Tools fail. A function might throw, an API might time out, or the model might pass an argument your function doesn't accept. You have a few options:
Return structured errors instead of throwing:
try: result = get_weather(**func_args) except Exception as e: result = {"error": str(e), "success": False} messages.append({ "role": "tool", "tool_call_id": tool_call.id, "content": json.dumps(result), })
The model will read the error and typically tell the user it couldn't complete the action, or retry with corrected arguments if the error message makes clear what went wrong.
Streaming With Tool Calls
Streaming and tool calls can coexist. When you set stream=True, tool call arguments arrive in chunks across multiple delta events and need to be assembled:
stream = client.chat.completions.create( model="llama-4-maverick", messages=messages, tools=tools, stream=True, ) tool_calls_buffer = {} for chunk in stream: delta = chunk.choices[0].delta if delta.tool_calls: for tc in delta.tool_calls: idx = tc.index if idx not in tool_calls_buffer: tool_calls_buffer[idx] = {"id": "", "name": "", "arguments": ""} if tc.id: tool_calls_buffer[idx]["id"] = tc.id if tc.function.name: tool_calls_buffer[idx]["name"] += tc.function.name if tc.function.arguments: tool_calls_buffer[idx]["arguments"] += tc.function.arguments elif delta.content: print(delta.content, end="", flush=True)
The index field on each streaming tool call delta tells you which tool call it belongs to when multiple tools are requested in parallel. Accumulate by index, then process the assembled calls once the stream finishes.
When Models Skip Tool Calls
Sometimes a model will answer a question directly rather than calling a tool you expected it to use. This usually means:
- The question can be answered from training data (e.g., "What is the boiling point of water?" does not need a weather API).
- The tool description does not clearly signal it should be used here.
tool_choiceis set to"auto", which lets the model decide.
If you need to guarantee the model uses a tool, set tool_choice="required" or pin it to a specific function. For agentic workflows where the model should always attempt to gather live data, required mode removes the ambiguity.
Putting It Together
Tool calling is the building block for most useful AI agents. Once you have the request/response loop working, the complexity is mostly in the tools themselves: making them reliable, fast, and well-described.
A few good next steps after this example:
- Add a database lookup tool and connect it to real data.
- Build a multi-tool agent with 5-10 functions and test how well the model decides between them.
- Add streaming so the user sees the model "thinking" before it executes a tool.
- Measure tool call latency at scale -- each round trip through an LLM adds up quickly in multi-step workflows, which is why inference speed matters more for agents than for simple chat.
GeneralCompute's API is fully compatible with the tool calling examples above. If you already have code using the OpenAI SDK, change base_url to https://api.generalcompute.com/v1 and your tool calls will work without modification. The API docs cover available models and rate limits.