Deploy any model — ours or yours — with inference speed that changes what's possible.
Hand this prompt to your autonomous coding agent and it'll walk through our signup flow on your behalf.
Sign me up for a General Compute API account and get an API key. Fetch instructions from https://docs.generalcompute.com/agent-signup and follow them.
Both running GPT OSS 120B — same model, same prompt. Only the infrastructure is different.
Ready to compare
Ready to compare
Try preset prompts or enter your own to compare inference speed in real-time
We rethought the entire stack from silicon to site selection — so you get better performance at a fraction of the cost.
MiniMax M2.5 model comparison
Throughput (tokens/sec)*
Higher is better
Energy Usage*
Lower is better
Energy Cost
Lower is better
*Projected on next-generation racks. NVIDIA throughput via Together AI benchmarks. Energy: US commercial avg vs. our rate.
From prototyping to production, we have the infrastructure to match your needs.
REST API with OpenAI-compatible endpoints. Access the fastest models with a single API key.
Get API KeyDedicated infrastructure with SLAs, custom scaling, and guaranteed capacity for your workloads.
Contact SalesDeploy any model on our optimized infrastructure. Same speed, your weights.
Learn MoreReal numbers, real infrastructure, real-time performance.
Faster Inference
Time to First Token
Uptime SLA
Tokens per Second
*Performance varies by model and geography.
OpenAI-compatible API. Change your base URL and API key — that's it. Your existing code works instantly, just faster.
from openai import OpenAI
client = OpenAI(
base_url="https://api.generalcompute.com",
api_key="your-api-key",
)
response = client.chat.completions.create(
model="gpt-oss-120b",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)Get your API key in seconds. OpenAI-compatible — just change your base URL and start shipping faster.