How to Migrate From OpenAI to GeneralCompute in 10 Lines of Code

GeneralCompute's API is OpenAI-compatible. That means if your app already calls OpenAI, the migration is mostly a matter of changing a base URL and swapping an API key. You don't need to rewrite your prompt logic, message formatting, streaming setup, or error handling.

This guide covers the exact diffs for Python, Node.js, and LangChain, along with a validation checklist for confirming the migration worked.

Why the Migration Is Short

The OpenAI SDK -- both the Python and JavaScript versions -- accepts a base_url (or baseURL) parameter that overrides where requests go. Most libraries built on top of the OpenAI API accept the same parameter. Because GeneralCompute implements the same REST interface (same endpoints, same request/response shapes, same streaming protocol), pointing your existing client at https://api.generalcompute.com/v1 is enough to switch providers.

The only things you need to update are:

The base URL
Your API key
The model name (if you're switching to a model with a different identifier)

Python Migration

Before

from openai import OpenAI

client = OpenAI(
    api_key="sk-..."
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain KV caching in one paragraph."}]
)
print(response.choices[0].message.content)

After

from openai import OpenAI

client = OpenAI(
    api_key="your-generalcompute-api-key",
    base_url="https://api.generalcompute.com/v1"
)

response = client.chat.completions.create(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "Explain KV caching in one paragraph."}]
)
print(response.choices[0].message.content)

Two lines changed: api_key and base_url. Everything else -- message format, response parsing, error handling -- stays identical.

Streaming in Python

Streaming works the same way:

from openai import OpenAI

client = OpenAI(
    api_key="your-generalcompute-api-key",
    base_url="https://api.generalcompute.com/v1"
)

with client.chat.completions.stream(
    model="llama-4-maverick",
    messages=[{"role": "user", "content": "Write a haiku about latency."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

The .stream() context manager, the text_stream iterator, and the flush pattern all carry over unchanged.

Using Environment Variables

The cleaner approach for production code is to keep provider config in environment variables:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["GENERALCOMPUTE_API_KEY"],
    base_url=os.environ.get("LLM_BASE_URL", "https://api.generalcompute.com/v1")
)

This pattern also makes it easy to switch back to OpenAI for testing -- just set different env vars rather than touching code.

Node.js / TypeScript Migration

Before

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "What is speculative decoding?" }],
});

console.log(response.choices[0].message.content);

After

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});

const response = await client.chat.completions.create({
  model: "llama-4-maverick",
  messages: [{ role: "user", content: "What is speculative decoding?" }],
});

console.log(response.choices[0].message.content);

Two lines changed. Note that the Node.js SDK uses baseURL (camelCase) while the Python SDK uses base_url (snake_case).

Streaming in Node.js

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});

const stream = client.chat.completions.stream({
  model: "llama-4-maverick",
  messages: [{ role: "user", content: "Describe transformer attention." }],
});

for await (const chunk of stream) {
  const text = chunk.choices[0]?.delta?.content ?? "";
  process.stdout.write(text);
}

The async iterator pattern works identically. If you're using .on("content", ...) event handlers or stream.finalMessage(), those work too.

LangChain Migration

LangChain's ChatOpenAI class accepts the same base_url and api_key overrides.

Before

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4o",
    openai_api_key="sk-..."
)

result = llm.invoke("What are the trade-offs of quantization?")
print(result.content)

After

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="llama-4-maverick",
    openai_api_key="your-generalcompute-api-key",
    openai_api_base="https://api.generalcompute.com/v1"
)

result = llm.invoke("What are the trade-offs of quantization?")
print(result.content)

The parameter is openai_api_base here (not base_url) because LangChain wraps the OpenAI client and exposes its own parameter names. Everything downstream -- chains, memory, callbacks, streaming -- works without modification.

LangChain Streaming

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

llm = ChatOpenAI(
    model="llama-4-maverick",
    openai_api_key="your-generalcompute-api-key",
    openai_api_base="https://api.generalcompute.com/v1",
    streaming=True
)

for chunk in llm.stream([HumanMessage(content="Explain MoE routing.")]):
    print(chunk.content, end="", flush=True)

LangChain with LCEL

If you're using LangChain Expression Language, nothing changes in your chain definitions:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(
    model="llama-4-maverick",
    openai_api_key="your-generalcompute-api-key",
    openai_api_base="https://api.generalcompute.com/v1"
)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful engineering assistant."),
    ("human", "{question}")
])

chain = prompt | llm | StrOutputParser()

print(chain.invoke({"question": "How does continuous batching work?"}))

The | chaining syntax, prompt templates, and output parsers all remain the same.

Other Libraries

Most libraries that wrap the OpenAI API follow the same pattern:

LlamaIndex:

from llama_index.llms.openai import OpenAI

llm = OpenAI(
    model="llama-4-maverick",
    api_key="your-generalcompute-api-key",
    api_base="https://api.generalcompute.com/v1"
)

Instructor (structured outputs):

import instructor
from openai import OpenAI

client = instructor.from_openai(
    OpenAI(
        api_key="your-generalcompute-api-key",
        base_url="https://api.generalcompute.com/v1"
    )
)

Vercel AI SDK:

import { createOpenAI } from "@ai-sdk/openai";

const generalcompute = createOpenAI({
  apiKey: process.env.GENERALCOMPUTE_API_KEY,
  baseURL: "https://api.generalcompute.com/v1",
});

The pattern is consistent: find the parameter that sets the API base URL, point it at GeneralCompute, and swap in your key.

Validation Checklist

After making the change, run through this checklist before deploying:

Basic connectivity

[ ] A simple non-streaming completion returns a response
[ ] Response has the expected shape (choices[0].message.content)
[ ] No authentication errors (confirm your API key is correct)

Streaming

[ ] Streaming completions produce chunks incrementally
[ ] Stream terminates cleanly (no hanging connections)
[ ] finish_reason is present on the final chunk

Your specific use case

[ ] System prompts work as expected
[ ] Multi-turn conversations (passing a messages array with history) produce coherent responses
[ ] Tool/function calling works if your app uses it
[ ] JSON mode works if your app uses response_format: { type: "json_object" }

Error handling

[ ] Rate limit errors surface as expected exceptions
[ ] Invalid model names return a clear error (not a silent failure)

Performance

[ ] TTFT (time to first token) meets your expectations
[ ] Token generation speed is acceptable for your use case

If anything in the checklist fails, the most common causes are: a mismatched model name (check the GeneralCompute docs for supported model identifiers), a base URL with a trailing slash that conflicts with path construction, or a library that hard-codes the OpenAI domain somewhere you haven't overridden yet.

Common Gotchas

Trailing slash in base URL. Some SDKs are sensitive to whether base_url ends with /v1 or /v1/. If you get 404s on endpoints, try removing or adding the trailing slash.

Library-specific parameter names. As shown above, different libraries use base_url, baseURL, api_base, or openai_api_base. Check the library's docs if the standard override isn't working.

Model name format. OpenAI uses names like gpt-4o and gpt-4o-mini. GeneralCompute uses names like llama-4-maverick and llama-4-scout. Make sure you've updated the model parameter -- this is the one change that doesn't fail silently, but it's easy to miss.

Embeddings. If your app calls client.embeddings.create(...), note that embedding models have different identifiers. Check the docs for the embedding model name you need.

Hardcoded URLs. Search your codebase for api.openai.com to catch any direct HTTP calls that bypass your client configuration.

Getting Started

Get a GeneralCompute API key at generalcompute.com. The migration is small enough that it's worth running in a branch and A/B testing the response quality before fully committing. Most teams find the behavior is equivalent for standard chat and completion tasks, with meaningfully faster token generation.

If you run into anything that behaves differently from what you expected, the GeneralCompute docs cover the full API surface including supported parameters and model options.