Up to 1,000 tokens/second by ditching the real bottleneck: GPUs
OpenCode running the identical task on both sides. A real coding session at measured operating points.

The same hardware sits behind all three. Begin on the public API, graduate to dedicated capacity, or bring your own model — the speed comes along for the ride.
OpenAI-compatible REST endpoints. Swap your base URL, keep your code.
Dedicated capacity with SLAs and guaranteed throughput, sized to your traffic.
Ship your own weights on the same optimized stack. Your model, our hardware.
Hand this prompt to any autonomous coding agent. It handles the whole signup flow and comes back with a key — no dashboard, no forms.
Sign me up for a General Compute API account and get an API key. Fetch instructions from https://docs.generalcompute.com/agent-signup and follow them.
GPT-OSS-120B on General Compute against the GPU cloud baseline — same model, same prompts, measured head to head.
See the full methodology and every model →Faster time to first token
Higher output throughput
Lower end-to-end latency
Output on GPT-OSS-120B
OpenAI-compatible API. Change your base URL, swap your key, and you're running on ASIC infrastructure. Your existing code doesn't change.
View Docsfrom openai import OpenAI
client = OpenAI(
base_url="https://api.generalcompute.com",
api_key="your-api-key",
)
response = client.chat.completions.create(
model="gpt-oss-120b",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)Get your API key in seconds. OpenAI-compatible — just change your base URL. $10 free credit to see the difference yourself.