How to Migrate From OpenAI to GeneralCompute in 10 Lines of Code
GeneralCompute's API is OpenAI-compatible. That means if your app already calls OpenAI, the migration is mostly a matter of changing a base URL and swapping an API key. You don't need to rewrite your prompt logic, message formatting, streaming setup, or error handling.
This guide covers the exact diffs for Python, Node.js, and LangChain, along with a validation checklist for confirming the migration worked.
Why the Migration Is Short
The OpenAI SDK -- both the Python and JavaScript versions -- accepts a base_url (or baseURL) parameter that overrides where requests go. Most libraries built on top of the OpenAI API accept the same parameter. Because GeneralCompute implements the same REST interface (same endpoints, same request/response shapes, same streaming protocol), pointing your existing client at https://api.generalcompute.com/v1 is enough to switch providers.
The only things you need to update are:
- The base URL
- Your API key
- The model name (if you're switching to a model with a different identifier)
Python Migration
Before
from openai import OpenAI client = OpenAI( api_key="sk-..." ) response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Explain KV caching in one paragraph."}] ) print(response.choices[0].message.content)
After
from openai import OpenAI client = OpenAI( api_key="your-generalcompute-api-key", base_url="https://api.generalcompute.com/v1" ) response = client.chat.completions.create( model="llama-4-maverick", messages=[{"role": "user", "content": "Explain KV caching in one paragraph."}] ) print(response.choices[0].message.content)
Two lines changed: api_key and base_url. Everything else -- message format, response parsing, error handling -- stays identical.
Streaming in Python
Streaming works the same way:
from openai import OpenAI client = OpenAI( api_key="your-generalcompute-api-key", base_url="https://api.generalcompute.com/v1" ) with client.chat.completions.stream( model="llama-4-maverick", messages=[{"role": "user", "content": "Write a haiku about latency."}] ) as stream: for text in stream.text_stream: print(text, end="", flush=True)
The .stream() context manager, the text_stream iterator, and the flush pattern all carry over unchanged.
Using Environment Variables
The cleaner approach for production code is to keep provider config in environment variables:
import os from openai import OpenAI client = OpenAI( api_key=os.environ["GENERALCOMPUTE_API_KEY"], base_url=os.environ.get("LLM_BASE_URL", "https://api.generalcompute.com/v1") )
This pattern also makes it easy to switch back to OpenAI for testing -- just set different env vars rather than touching code.
Node.js / TypeScript Migration
Before
import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, }); const response = await client.chat.completions.create({ model: "gpt-4o", messages: [{ role: "user", content: "What is speculative decoding?" }], }); console.log(response.choices[0].message.content);
After
import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.GENERALCOMPUTE_API_KEY, baseURL: "https://api.generalcompute.com/v1", }); const response = await client.chat.completions.create({ model: "llama-4-maverick", messages: [{ role: "user", content: "What is speculative decoding?" }], }); console.log(response.choices[0].message.content);
Two lines changed. Note that the Node.js SDK uses baseURL (camelCase) while the Python SDK uses base_url (snake_case).
Streaming in Node.js
import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.GENERALCOMPUTE_API_KEY, baseURL: "https://api.generalcompute.com/v1", }); const stream = client.chat.completions.stream({ model: "llama-4-maverick", messages: [{ role: "user", content: "Describe transformer attention." }], }); for await (const chunk of stream) { const text = chunk.choices[0]?.delta?.content ?? ""; process.stdout.write(text); }
The async iterator pattern works identically. If you're using .on("content", ...) event handlers or stream.finalMessage(), those work too.
LangChain Migration
LangChain's ChatOpenAI class accepts the same base_url and api_key overrides.
Before
from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="gpt-4o", openai_api_key="sk-..." ) result = llm.invoke("What are the trade-offs of quantization?") print(result.content)
After
from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="llama-4-maverick", openai_api_key="your-generalcompute-api-key", openai_api_base="https://api.generalcompute.com/v1" ) result = llm.invoke("What are the trade-offs of quantization?") print(result.content)
The parameter is openai_api_base here (not base_url) because LangChain wraps the OpenAI client and exposes its own parameter names. Everything downstream -- chains, memory, callbacks, streaming -- works without modification.
LangChain Streaming
from langchain_openai import ChatOpenAI from langchain_core.messages import HumanMessage llm = ChatOpenAI( model="llama-4-maverick", openai_api_key="your-generalcompute-api-key", openai_api_base="https://api.generalcompute.com/v1", streaming=True ) for chunk in llm.stream([HumanMessage(content="Explain MoE routing.")]): print(chunk.content, end="", flush=True)
LangChain with LCEL
If you're using LangChain Expression Language, nothing changes in your chain definitions:
from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.output_parsers import StrOutputParser llm = ChatOpenAI( model="llama-4-maverick", openai_api_key="your-generalcompute-api-key", openai_api_base="https://api.generalcompute.com/v1" ) prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful engineering assistant."), ("human", "{question}") ]) chain = prompt | llm | StrOutputParser() print(chain.invoke({"question": "How does continuous batching work?"}))
The | chaining syntax, prompt templates, and output parsers all remain the same.
Other Libraries
Most libraries that wrap the OpenAI API follow the same pattern:
LlamaIndex:
from llama_index.llms.openai import OpenAI llm = OpenAI( model="llama-4-maverick", api_key="your-generalcompute-api-key", api_base="https://api.generalcompute.com/v1" )
Instructor (structured outputs):
import instructor from openai import OpenAI client = instructor.from_openai( OpenAI( api_key="your-generalcompute-api-key", base_url="https://api.generalcompute.com/v1" ) )
Vercel AI SDK:
import { createOpenAI } from "@ai-sdk/openai"; const generalcompute = createOpenAI({ apiKey: process.env.GENERALCOMPUTE_API_KEY, baseURL: "https://api.generalcompute.com/v1", });
The pattern is consistent: find the parameter that sets the API base URL, point it at GeneralCompute, and swap in your key.
Validation Checklist
After making the change, run through this checklist before deploying:
Basic connectivity
- [ ] A simple non-streaming completion returns a response
- [ ] Response has the expected shape (
choices[0].message.content) - [ ] No authentication errors (confirm your API key is correct)
Streaming
- [ ] Streaming completions produce chunks incrementally
- [ ] Stream terminates cleanly (no hanging connections)
- [ ]
finish_reasonis present on the final chunk
Your specific use case
- [ ] System prompts work as expected
- [ ] Multi-turn conversations (passing a
messagesarray with history) produce coherent responses - [ ] Tool/function calling works if your app uses it
- [ ] JSON mode works if your app uses
response_format: { type: "json_object" }
Error handling
- [ ] Rate limit errors surface as expected exceptions
- [ ] Invalid model names return a clear error (not a silent failure)
Performance
- [ ] TTFT (time to first token) meets your expectations
- [ ] Token generation speed is acceptable for your use case
If anything in the checklist fails, the most common causes are: a mismatched model name (check the GeneralCompute docs for supported model identifiers), a base URL with a trailing slash that conflicts with path construction, or a library that hard-codes the OpenAI domain somewhere you haven't overridden yet.
Common Gotchas
Trailing slash in base URL. Some SDKs are sensitive to whether base_url ends with /v1 or /v1/. If you get 404s on endpoints, try removing or adding the trailing slash.
Library-specific parameter names. As shown above, different libraries use base_url, baseURL, api_base, or openai_api_base. Check the library's docs if the standard override isn't working.
Model name format. OpenAI uses names like gpt-4o and gpt-4o-mini. GeneralCompute uses names like llama-4-maverick and llama-4-scout. Make sure you've updated the model parameter -- this is the one change that doesn't fail silently, but it's easy to miss.
Embeddings. If your app calls client.embeddings.create(...), note that embedding models have different identifiers. Check the docs for the embedding model name you need.
Hardcoded URLs. Search your codebase for api.openai.com to catch any direct HTTP calls that bypass your client configuration.
Getting Started
Get a GeneralCompute API key at generalcompute.com. The migration is small enough that it's worth running in a branch and A/B testing the response quality before fully committing. Most teams find the behavior is equivalent for standard chat and completion tasks, with meaningfully faster token generation.
If you run into anything that behaves differently from what you expected, the GeneralCompute docs cover the full API surface including supported parameters and model options.