Agent Readout

OpenAI-Compatible APIs: How to Migrate Your App in 5 Minutes

A practical guide to migrating your OpenAI-powered app to a compatible API provider like GeneralCompute -- covering the base URL swap, supported endpoints, and integration code for LangChain, LlamaIndex, and Vercel AI SDK.

Author
General Compute
Published
2026-06-20
Tags
openai compatible api, migration, python, nodejs, langchain, vercel ai sdk, tutorial

Markdown body


If your app already uses the OpenAI API, switching providers takes about five minutes. The OpenAI API has become the standard interface for LLM services, and most alternative providers -- including GeneralCompute -- implement the same REST spec. You change a base URL and an API key, and your existing code keeps working.

This guide covers what "OpenAI-compatible" actually means, which endpoints are supported, and how to update integrations in Python, Node.js, LangChain, LlamaIndex, and Vercel AI SDK.

## What "OpenAI-Compatible" Means

The OpenAI API is a REST interface. Clients send JSON to specific paths like `/v1/chat/completions` and get back responses in a defined shape. When a provider says their API is OpenAI-compatible, they mean they implement those same paths and return responses in the same JSON format.

This matters because the OpenAI SDK -- both the Python and Node.js versions -- accepts a `base_url` parameter. If the provider's API returns responses that match the OpenAI schema, the SDK handles the rest: streaming, retries, error parsing, type annotations. You never need to write raw HTTP code.

## The Base URL Swap

The Python and Node.js OpenAI SDKs both accept a `base_url` (or `baseURL`) constructor argument. For GeneralCompute:

**Python:**
```python
from openai import OpenAI

client = OpenAI(
    api_key="your_generalcompute_api_key",
    base_url="https://api.generalcompute.com/v1",
)
```

**Node.js:**
```javascript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});
```

Everything else -- `client.chat.completions.create(...)`, streaming with `for await`, error handling -- stays identical.

If you're using environment variables, some apps read `OPENAI_API_KEY` and `OPENAI_BASE_URL` directly. In that case:

```bash
export OPENAI_BASE_URL=https://api.generalcompute.com/v1
export OPENAI_API_KEY=your_generalcompute_api_key
```

You don't need to change any code at all in those cases.

## Supported Endpoints

Most production apps rely on a handful of endpoints. Here's what GeneralCompute supports:

| Endpoint | Status | Notes |
|---|---|---|
| `POST /v1/chat/completions` | Supported | Full streaming, tool calling, JSON mode |
| `POST /v1/completions` | Supported | Legacy text completions |
| `POST /v1/embeddings` | Supported | Multiple embedding models |
| `GET /v1/models` | Supported | Lists available models |
| `POST /v1/audio/transcriptions` | Supported | Whisper-compatible transcription |

Chat completions are where most apps spend their time. The request format is identical: a `messages` array, a `model` string, and optional parameters like `temperature`, `max_tokens`, `stream`, and `tools`.

## Choosing a Model

When you migrate, you'll need to swap the model string. GeneralCompute doesn't serve OpenAI's GPT models, so `"gpt-4o"` won't resolve. You pick from GeneralCompute's available models instead.

A quick mapping by use case:

| Use case | Model string |
|---|---|
| General chat / Q&A | `llama-4-maverick` |
| Fast, low-cost responses | `llama-4-scout` |
| Code generation | `qwen3-coder` |
| Reasoning tasks | `qwq-32b` |

Run `GET /v1/models` (or `client.models.list()`) to get the full list of currently available models.

## Migrating a Python App

Here's a minimal before/after for a Python app:

**Before:**
```python
from openai import OpenAI

client = OpenAI(api_key="sk-...")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain attention mechanisms."}],
)
print(response.choices[0].message.content)
```

**After:**
```python
from openai import OpenAI

client = OpenAI(
    api_key="gc-...",
    base_url="https://api.generalcompute.com/v1",
)

response = client.chat.completions.create(
    model="llama-4-maverick",  # swap model name
    messages=[{"role": "user", "content": "Explain attention mechanisms."}],
)
print(response.choices[0].message.content)
```

Two lines changed: the constructor arguments and the model string.

Streaming is the same:

```python
with client.chat.completions.create(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "Write a haiku about compilers."}],
    stream=True,
) as stream:
    for chunk in stream:
        delta = chunk.choices[0].delta.content
        if delta:
            print(delta, end="", flush=True)
```

## Migrating a Node.js App

Same pattern in Node.js:

```javascript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});

const response = await client.chat.completions.create({
  model: "llama-4-maverick",
  messages: [{ role: "user", content: "What is KV cache?" }],
});

console.log(response.choices[0].message.content);
```

Streaming in Node.js:

```javascript
const stream = await client.chat.completions.create({
  model: "llama-4-maverick",
  messages: [{ role: "user", content: "List 5 sorting algorithms." }],
  stream: true,
});

for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta?.content;
  if (delta) process.stdout.write(delta);
}
```

## LangChain Integration

LangChain's `ChatOpenAI` class accepts `openai_api_base` and `openai_api_key` to redirect requests:

```python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(
    model="llama-4-maverick",
    openai_api_key="your_generalcompute_api_key",
    openai_api_base="https://api.generalcompute.com/v1",
)

response = llm.invoke([HumanMessage(content="Summarize the transformer architecture.")])
print(response.content)
```

For LCEL chains, this slots in as a drop-in replacement:

```python
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise technical writer."),
    ("human", "{question}"),
])

chain = prompt | llm | StrOutputParser()
result = chain.invoke({"question": "How does speculative decoding work?"})
print(result)
```

LangChain's streaming, callbacks, and async support all work the same way. The `ChatOpenAI` class handles the protocol; it doesn't care which server it's talking to as long as the API is compatible.

## LlamaIndex Integration

LlamaIndex's `OpenAI` class takes the same parameters:

```python
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

llm = OpenAI(
    model="llama-4-maverick",
    api_key="your_generalcompute_api_key",
    api_base="https://api.generalcompute.com/v1",
)

Settings.llm = llm
```

From there, LlamaIndex's query engines, retrievers, and agents use this LLM automatically. For embeddings, LlamaIndex also has an `OpenAIEmbedding` class that accepts the same `api_base` override:

```python
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",  # or a GeneralCompute embedding model
    api_key="your_generalcompute_api_key",
    api_base="https://api.generalcompute.com/v1",
)

Settings.embed_model = embed_model
```

## Vercel AI SDK

The Vercel AI SDK uses a `createOpenAI` helper that accepts a `baseURL` option:

```typescript
import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";

const generalcompute = createOpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});

const { text } = await generateText({
  model: generalcompute("llama-4-maverick"),
  prompt: "Describe how attention masking works in transformer decoders.",
});

console.log(text);
```

For streaming in a Next.js API route or Server Action:

```typescript
import { createOpenAI } from "@ai-sdk/openai";
import { streamText } from "ai";

const generalcompute = createOpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: generalcompute("llama-4-maverick"),
    messages,
  });

  return result.toDataStreamResponse();
}
```

The `useChat` hook in the browser works unchanged. The stream protocol is identical.

## Tool Calling

Tool calling (function calling) works with OpenAI-compatible providers that support it. The request format is the same:

```python
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string"},
                    "units": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["city"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "What's the weather in Berlin?"}],
    tools=tools,
    tool_choice="auto",
)

if response.choices[0].message.tool_calls:
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Calling: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")
```

Check the [GeneralCompute docs](https://generalcompute.com/docs) for which models support tool calling -- not all models in any provider's catalog implement it, so it's worth confirming before building a tool-heavy workflow.

## JSON Mode

JSON mode forces the model to output valid JSON:

```python
response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[
        {
            "role": "user",
            "content": "Return a JSON object with fields: name, capital, population for France.",
        }
    ],
    response_format={"type": "json_object"},
)

import json
data = json.loads(response.choices[0].message.content)
print(data)
```

If you're using the `instructor` library for structured outputs with Pydantic, it works by wrapping the OpenAI client, so the same `base_url` approach applies:

```python
import instructor
from openai import OpenAI
from pydantic import BaseModel

raw_client = OpenAI(
    api_key="your_generalcompute_api_key",
    base_url="https://api.generalcompute.com/v1",
)
client = instructor.from_openai(raw_client)

class Country(BaseModel):
    name: str
    capital: str
    population: int

country = client.chat.completions.create(
    model="llama-4-maverick",
    response_model=Country,
    messages=[{"role": "user", "content": "Tell me about France."}],
)
print(country.model_dump())
```

## Validation Checklist

Before shipping a migration:

- [ ] Base URL updated and pointing to the new provider
- [ ] API key rotated/updated in all environments (dev, staging, prod)
- [ ] Model strings updated to models the new provider actually serves
- [ ] Streaming responses tested end-to-end (not just non-streaming)
- [ ] Tool calling tested if your app uses it
- [ ] Error handling tested with an invalid API key to confirm error shapes match expectations
- [ ] Token counting logic reviewed (if you're doing client-side token estimation, different tokenizers may shift counts slightly)

The most common migration issue is forgetting to update the model name. Everything else usually works on the first try.

## Next Steps

If you're moving from OpenAI to GeneralCompute, the [API docs](https://generalcompute.com/docs) list every supported model, rate limits, and endpoint details. The base URL is `https://api.generalcompute.com/v1`, and any key you generate on the platform works with the examples above.

For apps where inference speed matters -- voice agents, coding assistants, real-time user-facing features -- the main payoff of switching is throughput. The same OpenAI SDK code, pointed at faster infrastructure, gets meaningfully lower latency without any other changes.
ModeHumanAgent