Agent Readout

How to Migrate From OpenAI to GeneralCompute in 10 Lines of Code

GeneralCompute's API is fully OpenAI-compatible. Here's exactly what to change in Python, Node.js, and LangChain, plus a validation checklist to make sure nothing breaks.

Author
General Compute
Published
2026-06-24
Tags
migrate openai sdk, python, nodejs, langchain, tutorial, api

Markdown body


GeneralCompute's API is OpenAI-compatible. That means if your app already calls OpenAI, the migration is mostly a matter of changing a base URL and swapping an API key. You don't need to rewrite your prompt logic, message formatting, streaming setup, or error handling.

This guide covers the exact diffs for Python, Node.js, and LangChain, along with a validation checklist for confirming the migration worked.

## Why the Migration Is Short

The OpenAI SDK -- both the Python and JavaScript versions -- accepts a `base_url` (or `baseURL`) parameter that overrides where requests go. Most libraries built on top of the OpenAI API accept the same parameter. Because GeneralCompute implements the same REST interface (same endpoints, same request/response shapes, same streaming protocol), pointing your existing client at `https://api.generalcompute.com/v1` is enough to switch providers.

The only things you need to update are:

1. The base URL
2. Your API key
3. The model name (if you're switching to a model with a different identifier)

## Python Migration

### Before

```python
from openai import OpenAI

client = OpenAI(
    api_key="sk-..."
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain KV caching in one paragraph."}]
)
print(response.choices[0].message.content)
```

### After

```python
from openai import OpenAI

client = OpenAI(
    api_key="your-generalcompute-api-key",
    base_url="https://api.generalcompute.com/v1"
)

response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "Explain KV caching in one paragraph."}]
)
print(response.choices[0].message.content)
```

Two lines changed: `api_key` and `base_url`. Everything else -- message format, response parsing, error handling -- stays identical.

### Streaming in Python

Streaming works the same way:

```python
from openai import OpenAI

client = OpenAI(
    api_key="your-generalcompute-api-key",
    base_url="https://api.generalcompute.com/v1"
)

with client.chat.completions.stream(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "Write a haiku about latency."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)
```

The `.stream()` context manager, the `text_stream` iterator, and the flush pattern all carry over unchanged.

### Using Environment Variables

The cleaner approach for production code is to keep provider config in environment variables:

```python
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["GENERALCOMPUTE_API_KEY"],
    base_url=os.environ.get("LLM_BASE_URL", "https://api.generalcompute.com/v1")
)
```

This pattern also makes it easy to switch back to OpenAI for testing -- just set different env vars rather than touching code.

## Node.js / TypeScript Migration

### Before

```typescript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "What is speculative decoding?" }],
});

console.log(response.choices[0].message.content);
```

### After

```typescript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});

const response = await client.chat.completions.create({
  model: "llama-4-maverick",
  messages: [{ role: "user", content: "What is speculative decoding?" }],
});

console.log(response.choices[0].message.content);
```

Two lines changed. Note that the Node.js SDK uses `baseURL` (camelCase) while the Python SDK uses `base_url` (snake_case).

### Streaming in Node.js

```typescript
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});

const stream = client.chat.completions.stream({
  model: "llama-4-maverick",
  messages: [{ role: "user", content: "Describe transformer attention." }],
});

for await (const chunk of stream) {
  const text = chunk.choices[0]?.delta?.content ?? "";
  process.stdout.write(text);
}
```

The async iterator pattern works identically. If you're using `.on("content", ...)` event handlers or `stream.finalMessage()`, those work too.

## LangChain Migration

LangChain's `ChatOpenAI` class accepts the same `base_url` and `api_key` overrides.

### Before

```python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o",
    openai_api_key="sk-..."
)

result = llm.invoke("What are the trade-offs of quantization?")
print(result.content)
```

### After

```python
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="llama-4-maverick",
    openai_api_key="your-generalcompute-api-key",
    openai_api_base="https://api.generalcompute.com/v1"
)

result = llm.invoke("What are the trade-offs of quantization?")
print(result.content)
```

The parameter is `openai_api_base` here (not `base_url`) because LangChain wraps the OpenAI client and exposes its own parameter names. Everything downstream -- chains, memory, callbacks, streaming -- works without modification.

### LangChain Streaming

```python
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(
    model="llama-4-maverick",
    openai_api_key="your-generalcompute-api-key",
    openai_api_base="https://api.generalcompute.com/v1",
    streaming=True
)

for chunk in llm.stream([HumanMessage(content="Explain MoE routing.")]):
    print(chunk.content, end="", flush=True)
```

### LangChain with LCEL

If you're using LangChain Expression Language, nothing changes in your chain definitions:

```python
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(
    model="llama-4-maverick",
    openai_api_key="your-generalcompute-api-key",
    openai_api_base="https://api.generalcompute.com/v1"
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful engineering assistant."),
    ("human", "{question}")
])

chain = prompt | llm | StrOutputParser()

print(chain.invoke({"question": "How does continuous batching work?"}))
```

The `|` chaining syntax, prompt templates, and output parsers all remain the same.

## Other Libraries

Most libraries that wrap the OpenAI API follow the same pattern:

**LlamaIndex:**
```python
from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="llama-4-maverick",
    api_key="your-generalcompute-api-key",
    api_base="https://api.generalcompute.com/v1"
)
```

**Instructor (structured outputs):**
```python
import instructor
from openai import OpenAI

client = instructor.from_openai(
    OpenAI(
        api_key="your-generalcompute-api-key",
        base_url="https://api.generalcompute.com/v1"
    )
)
```

**Vercel AI SDK:**
```typescript
import { createOpenAI } from "@ai-sdk/openai";

const generalcompute = createOpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});
```

The pattern is consistent: find the parameter that sets the API base URL, point it at GeneralCompute, and swap in your key.

## Validation Checklist

After making the change, run through this checklist before deploying:

**Basic connectivity**
- [ ] A simple non-streaming completion returns a response
- [ ] Response has the expected shape (`choices[0].message.content`)
- [ ] No authentication errors (confirm your API key is correct)

**Streaming**
- [ ] Streaming completions produce chunks incrementally
- [ ] Stream terminates cleanly (no hanging connections)
- [ ] `finish_reason` is present on the final chunk

**Your specific use case**
- [ ] System prompts work as expected
- [ ] Multi-turn conversations (passing a `messages` array with history) produce coherent responses
- [ ] Tool/function calling works if your app uses it
- [ ] JSON mode works if your app uses `response_format: { type: "json_object" }`

**Error handling**
- [ ] Rate limit errors surface as expected exceptions
- [ ] Invalid model names return a clear error (not a silent failure)

**Performance**
- [ ] TTFT (time to first token) meets your expectations
- [ ] Token generation speed is acceptable for your use case

If anything in the checklist fails, the most common causes are: a mismatched model name (check the [GeneralCompute docs](https://generalcompute.com) for supported model identifiers), a base URL with a trailing slash that conflicts with path construction, or a library that hard-codes the OpenAI domain somewhere you haven't overridden yet.

## Common Gotchas

**Trailing slash in base URL.** Some SDKs are sensitive to whether `base_url` ends with `/v1` or `/v1/`. If you get 404s on endpoints, try removing or adding the trailing slash.

**Library-specific parameter names.** As shown above, different libraries use `base_url`, `baseURL`, `api_base`, or `openai_api_base`. Check the library's docs if the standard override isn't working.

**Model name format.** OpenAI uses names like `gpt-4o` and `gpt-4o-mini`. GeneralCompute uses names like `llama-4-maverick` and `llama-4-scout`. Make sure you've updated the `model` parameter -- this is the one change that doesn't fail silently, but it's easy to miss.

**Embeddings.** If your app calls `client.embeddings.create(...)`, note that embedding models have different identifiers. Check the docs for the embedding model name you need.

**Hardcoded URLs.** Search your codebase for `api.openai.com` to catch any direct HTTP calls that bypass your client configuration.

## Getting Started

Get a GeneralCompute API key at [generalcompute.com](https://generalcompute.com). The migration is small enough that it's worth running in a branch and A/B testing the response quality before fully committing. Most teams find the behavior is equivalent for standard chat and completion tasks, with meaningfully faster token generation.

If you run into anything that behaves differently from what you expected, the [GeneralCompute docs](https://generalcompute.com) cover the full API surface including supported parameters and model options.
ModeHumanAgent