OpenAI-Compatible APIs: How to Migrate Your App in 5 Minutes

If your app already uses the OpenAI API, switching providers takes about five minutes. The OpenAI API has become the standard interface for LLM services, and most alternative providers -- including GeneralCompute -- implement the same REST spec. You change a base URL and an API key, and your existing code keeps working.

This guide covers what "OpenAI-compatible" actually means, which endpoints are supported, and how to update integrations in Python, Node.js, LangChain, LlamaIndex, and Vercel AI SDK.

What "OpenAI-Compatible" Means

The OpenAI API is a REST interface. Clients send JSON to specific paths like /v1/chat/completions and get back responses in a defined shape. When a provider says their API is OpenAI-compatible, they mean they implement those same paths and return responses in the same JSON format.

This matters because the OpenAI SDK -- both the Python and Node.js versions -- accepts a base_url parameter. If the provider's API returns responses that match the OpenAI schema, the SDK handles the rest: streaming, retries, error parsing, type annotations. You never need to write raw HTTP code.

The Base URL Swap

The Python and Node.js OpenAI SDKs both accept a base_url (or baseURL) constructor argument. For GeneralCompute:

Python:

from openai import OpenAI

client = OpenAI(
    api_key="your_generalcompute_api_key",
    base_url="https://api.generalcompute.com/v1",
)

Node.js:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});

Everything else -- client.chat.completions.create(...), streaming with for await, error handling -- stays identical.

If you're using environment variables, some apps read OPENAI_API_KEY and OPENAI_BASE_URL directly. In that case:

export OPENAI_BASE_URL=https://api.generalcompute.com/v1
export OPENAI_API_KEY=your_generalcompute_api_key

You don't need to change any code at all in those cases.

Supported Endpoints

Most production apps rely on a handful of endpoints. Here's what GeneralCompute supports:

| Endpoint | Status | Notes | |---|---|---| | POST /v1/chat/completions | Supported | Full streaming, tool calling, JSON mode | | POST /v1/completions | Supported | Legacy text completions | | POST /v1/embeddings | Supported | Multiple embedding models | | GET /v1/models | Supported | Lists available models | | POST /v1/audio/transcriptions | Supported | Whisper-compatible transcription |

Chat completions are where most apps spend their time. The request format is identical: a messages array, a model string, and optional parameters like temperature, max_tokens, stream, and tools.

Choosing a Model

When you migrate, you'll need to swap the model string. GeneralCompute doesn't serve OpenAI's GPT models, so "gpt-4o" won't resolve. You pick from GeneralCompute's available models instead.

A quick mapping by use case:

| Use case | Model string | |---|---| | General chat / Q&A | llama-4-maverick | | Fast, low-cost responses | llama-4-scout | | Code generation | qwen3-coder | | Reasoning tasks | qwq-32b |

Run GET /v1/models (or client.models.list()) to get the full list of currently available models.

Migrating a Python App

Here's a minimal before/after for a Python app:

Before:

from openai import OpenAI

client = OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain attention mechanisms."}],
)
print(response.choices[0].message.content)

After:

from openai import OpenAI

client = OpenAI(
    api_key="gc-...",
    base_url="https://api.generalcompute.com/v1",
)

response = client.chat.completions.create(
    model="llama-4-maverick",  # swap model name
    messages=[{"role": "user", "content": "Explain attention mechanisms."}],
)
print(response.choices[0].message.content)

Two lines changed: the constructor arguments and the model string.

Streaming is the same:

with client.chat.completions.create(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "Write a haiku about compilers."}],
    stream=True,
) as stream:
    for chunk in stream:
        delta = chunk.choices[0].delta.content
        if delta:
            print(delta, end="", flush=True)

Migrating a Node.js App

Same pattern in Node.js:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});

const response = await client.chat.completions.create({
  model: "llama-4-maverick",
  messages: [{ role: "user", content: "What is KV cache?" }],
});

console.log(response.choices[0].message.content);

Streaming in Node.js:

const stream = await client.chat.completions.create({
  model: "llama-4-maverick",
  messages: [{ role: "user", content: "List 5 sorting algorithms." }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}

LangChain Integration

LangChain's ChatOpenAI class accepts openai_api_base and openai_api_key to redirect requests:

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(
    model="llama-4-maverick",
    openai_api_key="your_generalcompute_api_key",
    openai_api_base="https://api.generalcompute.com/v1",
)

response = llm.invoke([HumanMessage(content="Summarize the transformer architecture.")])
print(response.content)

For LCEL chains, this slots in as a drop-in replacement:

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise technical writer."),
    ("human", "{question}"),
])

chain = prompt | llm | StrOutputParser()
result = chain.invoke({"question": "How does speculative decoding work?"})
print(result)

LangChain's streaming, callbacks, and async support all work the same way. The ChatOpenAI class handles the protocol; it doesn't care which server it's talking to as long as the API is compatible.

LlamaIndex Integration

LlamaIndex's OpenAI class takes the same parameters:

from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

llm = OpenAI(
    model="llama-4-maverick",
    api_key="your_generalcompute_api_key",
    api_base="https://api.generalcompute.com/v1",
)

Settings.llm = llm

From there, LlamaIndex's query engines, retrievers, and agents use this LLM automatically. For embeddings, LlamaIndex also has an OpenAIEmbedding class that accepts the same api_base override:

from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",  # or a GeneralCompute embedding model
    api_key="your_generalcompute_api_key",
    api_base="https://api.generalcompute.com/v1",
)

Settings.embed_model = embed_model

Vercel AI SDK

The Vercel AI SDK uses a createOpenAI helper that accepts a baseURL option:

import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";

const generalcompute = createOpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});

const { text } = await generateText({
  model: generalcompute("llama-4-maverick"),
  prompt: "Describe how attention masking works in transformer decoders.",
});

console.log(text);

For streaming in a Next.js API route or Server Action:

import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";

const generalcompute = createOpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: generalcompute("llama-4-maverick"),
    messages,
  });

  return result.toDataStreamResponse();
}

The useChat hook in the browser works unchanged. The stream protocol is identical.

Tool Calling

Tool calling (function calling) works with OpenAI-compatible providers that support it. The request format is the same:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "units": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["city"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "What's the weather in Berlin?"}],
    tools=tools,
    tool_choice="auto",
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Calling: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

Check the GeneralCompute docs for which models support tool calling -- not all models in any provider's catalog implement it, so it's worth confirming before building a tool-heavy workflow.

JSON Mode

JSON mode forces the model to output valid JSON:

response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[
        {
            "role": "user",
            "content": "Return a JSON object with fields: name, capital, population for France.",
        }
    ],
    response_format={"type": "json_object"},
)

import json
data = json.loads(response.choices[0].message.content)
print(data)

If you're using the instructor library for structured outputs with Pydantic, it works by wrapping the OpenAI client, so the same base_url approach applies:

import instructor
from openai import OpenAI
from pydantic import BaseModel

raw_client = OpenAI(
    api_key="your_generalcompute_api_key",
    base_url="https://api.generalcompute.com/v1",
)
client = instructor.from_openai(raw_client)

class Country(BaseModel):
    name: str
    capital: str
    population: int

country = client.chat.completions.create(
    model="llama-4-maverick",
    response_model=Country,
    messages=[{"role": "user", "content": "Tell me about France."}],
)
print(country.model_dump())

Validation Checklist

Before shipping a migration:

[ ] Base URL updated and pointing to the new provider
[ ] API key rotated/updated in all environments (dev, staging, prod)
[ ] Model strings updated to models the new provider actually serves
[ ] Streaming responses tested end-to-end (not just non-streaming)
[ ] Tool calling tested if your app uses it
[ ] Error handling tested with an invalid API key to confirm error shapes match expectations
[ ] Token counting logic reviewed (if you're doing client-side token estimation, different tokenizers may shift counts slightly)

The most common migration issue is forgetting to update the model name. Everything else usually works on the first try.

Next Steps

If you're moving from OpenAI to GeneralCompute, the API docs list every supported model, rate limits, and endpoint details. The base URL is https://api.generalcompute.com/v1, and any key you generate on the platform works with the examples above.

For apps where inference speed matters -- voice agents, coding assistants, real-time user-facing features -- the main payoff of switching is throughput. The same OpenAI SDK code, pointed at faster infrastructure, gets meaningfully lower latency without any other changes.