Up to 1,000 tokens/second by ditching the real bottleneck: GPUs
A real-time replay of a measured benchmark — GPT-OSS-120B, both providers generating exactly 1,000 tokens. These are the actual latencies, re-enacted as a stream.
GPT-OSS-120B on General Compute vs. the GPU cloud baseline, at a 256-token context. Same model, same prompts — measured head to head.
Mean (ms)
Mean (ms)
Mean (tok/s)
Hand this prompt to any autonomous coding agent. It'll handle the whole signup flow and return with an API key ready to use.
Sign me up for a General Compute API account and get an API key. Fetch instructions from https://docs.generalcompute.com/agent-signup and follow them.
Whether you're prototyping with our models or deploying your own weights at scale — same hardware, same speed, your choice of setup.
REST API with OpenAI-compatible endpoints. Access the fastest models with a single API key.
Get API KeyDedicated infrastructure with SLAs, custom scaling, and guaranteed capacity for your workloads.
Contact SalesDeploy any model on our optimized infrastructure. Same speed, your weights.
Learn MoreFaster Inference
Time to First Token
Uptime SLA
Tokens per Second
*Performance varies by model and geography.
OpenAI-compatible API. Change your base URL, swap your key, and you're running on ASIC infrastructure. Your existing code doesn't change.
View Docsfrom openai import OpenAI
client = OpenAI(
base_url="https://api.generalcompute.com",
api_key="your-api-key",
)
response = client.chat.completions.create(
model="gpt-oss-120b",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)Get your API key in seconds. OpenAI-compatible — just change your base URL. $10 free credit to see the difference yourself.