Tool Calling With Open-Source LLMs: A Complete Guide

Tool calling (also called function calling) lets an LLM signal that it wants to invoke a function, rather than just returning text. The model does not execute anything itself -- it outputs a structured request, and your application code runs the function and feeds the result back. This back-and-forth is what makes it possible to build assistants that can look up data, call APIs, or take actions in the world.

This guide covers how tool calling works mechanically, which open-source models support it, and complete Python and Node.js examples you can drop into your own project.

How Tool Calling Works

The flow is straightforward but worth spelling out:

You define one or more tools as JSON Schema objects describing the function name, parameters, and types.
You send those definitions alongside the conversation messages to the LLM.
The model either responds with text (if it can answer directly) or responds with a tool_calls object naming the function and the arguments it wants to pass.
Your application runs the function, gets a result, and adds a tool role message with that result back into the conversation.
You call the LLM again with the updated message list. It reads the tool result and produces a final response.

The model never calls your code. It only produces a structured JSON payload describing what it wants called and with what arguments. Your application is in full control of whether and how to execute it.

The Message Structure

A conversation with tool calls looks like this:

system: You are a helpful assistant.
user: What's the weather in San Francisco?
assistant: [tool_calls: get_weather(location="San Francisco")]
tool: {"temperature": 62, "condition": "fog"}
assistant: It's 62°F and foggy in San Francisco right now.

The tool_calls response from the assistant contains a JSON object. The tool message contains whatever your code returned from actually running that function. This full history is sent on the next request so the model has context for its final reply.

Models That Support Tool Calling

Tool calling requires models that were specifically trained to produce structured tool_calls responses. Not every open-source model supports this -- it depends on whether the model was fine-tuned for function calling and whether the inference server supports parsing those outputs.

Models available on GeneralCompute that support tool calling:

| Model | Notes | |---|---| | Llama 4 Maverick | Strong tool calling, handles parallel and sequential tool use well | | Llama 4 Scout | Faster, lower cost, good for simpler tool use | | Qwen3-Coder | Excellent for code-adjacent tool calls, returns well-typed arguments | | DeepSeek V3 | Strong reasoning about when to call tools vs. answer directly |

All of these work through GeneralCompute's OpenAI-compatible API, so you use the same tools and tool_choice parameters you would with the OpenAI SDK.

Python Example

Install the OpenAI SDK if you haven't:

pip install openai

Defining Tools

Tools are described as JSON Schema objects. Here's a simple weather tool:

import json
from openai import OpenAI

client = OpenAI(
    api_key="your_generalcompute_api_key",
    base_url="https://api.generalcompute.com/v1",
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. 'San Francisco'",
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit. Defaults to fahrenheit.",
                    },
                },
                "required": ["location"],
            },
        },
    }
]

Running the Tool Loop

def get_weather(location: str, unit: str = "fahrenheit") -> dict:
    # In a real app this would call a weather API
    return {"temperature": 62, "unit": unit, "condition": "fog", "location": location}

def run_with_tools(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.chat.completions.create(
            model="llama-4-maverick",
            messages=messages,
            tools=tools,
            tool_choice="auto",
        )

        message = response.choices[0].message

        # If no tool call, we're done
        if not message.tool_calls:
            return message.content

        # Add the assistant's tool_calls message to history
        messages.append(message)

        # Execute each tool call and add results
        for tool_call in message.tool_calls:
            func_name = tool_call.function.name
            func_args = json.loads(tool_call.function.arguments)

            if func_name == "get_weather":
                result = get_weather(**func_args)
            else:
                result = {"error": f"Unknown function: {func_name}"}

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result),
            })

        # Loop back to call the model again with tool results

result = run_with_tools("What's the weather in San Francisco?")
print(result)

A few things to note:

tool_choice="auto" lets the model decide when to call tools. You can also set it to "required" to force a tool call, or pass {"type": "function", "function": {"name": "get_weather"}} to force a specific function.
The tool_call_id on the tool result message must match the id from the corresponding tool_call object. This is how the model correlates results with requests when multiple tools fire in parallel.
The while True loop handles the case where the model calls multiple tools in sequence -- it keeps going until it returns a plain text response with no tool calls.

Parallel Tool Calls

A well-trained model will sometimes issue multiple tool calls in a single response when it determines the calls are independent. For example, if you ask "What's the weather in SF and NYC?", the model might request both in one shot rather than asking for them one at a time:

# Response might contain two tool_calls
for tool_call in message.tool_calls:
    # tool_call.function.name == "get_weather"
    # tool_call.function.arguments == '{"location": "San Francisco"}'
    # tool_call.function.arguments == '{"location": "New York City"}'
    ...

The loop above handles this automatically since it iterates over all tool_calls in the response.

Node.js Example

npm install openai

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "your_generalcompute_api_key",
  baseURL: "https://api.generalcompute.com/v1",
});

const tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get the current weather for a location.",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "City name, e.g. 'San Francisco'",
          },
          unit: {
            type: "string",
            enum: ["celsius", "fahrenheit"],
          },
        },
        required: ["location"],
      },
    },
  },
];

function getWeather(location, unit = "fahrenheit") {
  // Real implementation would call a weather API
  return { temperature: 62, unit, condition: "fog", location };
}

async function runWithTools(userMessage) {
  const messages = [{ role: "user", content: userMessage }];

  while (true) {
    const response = await client.chat.completions.create({
      model: "llama-4-maverick",
      messages,
      tools,
      tool_choice: "auto",
    });

    const message = response.choices[0].message;

    if (!message.tool_calls || message.tool_calls.length === 0) {
      return message.content;
    }

    messages.push(message);

    for (const toolCall of message.tool_calls) {
      const funcName = toolCall.function.name;
      const funcArgs = JSON.parse(toolCall.function.arguments);

      let result;
      if (funcName === "get_weather") {
        result = getWeather(funcArgs.location, funcArgs.unit);
      } else {
        result = { error: `Unknown function: ${funcName}` };
      }

      messages.push({
        role: "tool",
        tool_call_id: toolCall.id,
        content: JSON.stringify(result),
      });
    }
  }
}

const answer = await runWithTools("What's the weather in San Francisco?");
console.log(answer);

The structure is identical to the Python version. The while loop runs until the model stops requesting tool calls and returns a plain response.

Writing Better Tool Descriptions

The description fields in your tool schema matter more than they might seem. The model decides whether to call a tool, and which arguments to pass, based almost entirely on the description text. Vague descriptions lead to missed calls or wrong arguments.

A few patterns that work well:

Be specific about what the function returns, not just what it does:

"description": "Fetch the current temperature and weather conditions for a city. Returns temperature in the requested unit, a short condition string (e.g. 'sunny', 'fog'), and humidity percentage."

List units and formats explicitly for numeric parameters:

"description": "Unix timestamp in seconds. Use the start of the day if no specific time is given."

Describe edge cases the model should know about:

"description": "Look up a user by email address. Returns null if no account exists. Do not call this for phone numbers."

Bad descriptions produce bad function calls. If you're seeing the model call with wrong argument types or skip tool calls it should be making, the description is usually the first thing to fix.

Handling Tool Errors

Tools fail. A function might throw, an API might time out, or the model might pass an argument your function doesn't accept. You have a few options:

Return structured errors instead of throwing:

try:
    result = get_weather(**func_args)
except Exception as e:
    result = {"error": str(e), "success": False}

messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": json.dumps(result),
})

The model will read the error and typically tell the user it couldn't complete the action, or retry with corrected arguments if the error message makes clear what went wrong.

Streaming With Tool Calls

Streaming and tool calls can coexist. When you set stream=True, tool call arguments arrive in chunks across multiple delta events and need to be assembled:

stream = client.chat.completions.create(
    model="llama-4-maverick",
    messages=messages,
    tools=tools,
    stream=True,
)

tool_calls_buffer = {}

for chunk in stream:
    delta = chunk.choices[0].delta

    if delta.tool_calls:
        for tc in delta.tool_calls:
            idx = tc.index
            if idx not in tool_calls_buffer:
                tool_calls_buffer[idx] = {"id": "", "name": "", "arguments": ""}
            if tc.id:
                tool_calls_buffer[idx]["id"] = tc.id
            if tc.function.name:
                tool_calls_buffer[idx]["name"] += tc.function.name
            if tc.function.arguments:
                tool_calls_buffer[idx]["arguments"] += tc.function.arguments

    elif delta.content:
        print(delta.content, end="", flush=True)

The index field on each streaming tool call delta tells you which tool call it belongs to when multiple tools are requested in parallel. Accumulate by index, then process the assembled calls once the stream finishes.

When Models Skip Tool Calls

Sometimes a model will answer a question directly rather than calling a tool you expected it to use. This usually means:

The question can be answered from training data (e.g., "What is the boiling point of water?" does not need a weather API).
The tool description does not clearly signal it should be used here.
tool_choice is set to "auto", which lets the model decide.

If you need to guarantee the model uses a tool, set tool_choice="required" or pin it to a specific function. For agentic workflows where the model should always attempt to gather live data, required mode removes the ambiguity.

Putting It Together

Tool calling is the building block for most useful AI agents. Once you have the request/response loop working, the complexity is mostly in the tools themselves: making them reliable, fast, and well-described.

A few good next steps after this example:

Add a database lookup tool and connect it to real data.
Build a multi-tool agent with 5-10 functions and test how well the model decides between them.
Add streaming so the user sees the model "thinking" before it executes a tool.
Measure tool call latency at scale -- each round trip through an LLM adds up quickly in multi-step workflows, which is why inference speed matters more for agents than for simple chat.

GeneralCompute's API is fully compatible with the tool calling examples above. If you already have code using the OpenAI SDK, change base_url to https://api.generalcompute.com/v1 and your tool calls will work without modification. The API docs cover available models and rate limits.