We raised $15M to build the world's fastest neocloud.Read
Read the whitepaper

Inference at the speed of light

Up to 1,000 tokens/second by ditching the real bottleneck: GPUs

Same model. Same output. 14.7x faster.

A real-time replay of a measured benchmark — GPT-OSS-120B, both providers generating exactly 1,000 tokens. These are the actual latencies, re-enacted as a stream.

General Compute
0.0sstreaming…
 
1,000 tokens
GPU Cloud Baseline
0.0squeued…
 
1,000 tokens
Real Benchmarks

Faster than the GPU cloud baseline.

GPT-OSS-120B on General Compute vs. the GPU cloud baseline, at a 256-token context. Same model, same prompts — measured head to head.

Time to First Token

Mean (ms)

7.1x faster
General Compute
1,136
GPU Cloud Baseline
8,084

End-to-End Latency

Mean (ms)

16.2x faster
General Compute
2,119
GPU Cloud Baseline
34,261

Output Throughput

Mean (tok/s)

8.5x faster
General Compute
1,025
GPU Cloud Baseline
121
Sign up with your agent

Your agent can sign up for you.

Hand this prompt to any autonomous coding agent. It'll handle the whole signup flow and return with an API key ready to use.

CodexCodexClaude CodeClaude CodeOpenCodeOpenCodeCursorCursorAiderAider
Sign me up for a General Compute API account and get an API key. Fetch instructions from https://docs.generalcompute.com/agent-signup and follow them.

From first API call to full production.

Whether you're prototyping with our models or deploying your own weights at scale — same hardware, same speed, your choice of setup.

API Access

REST API with OpenAI-compatible endpoints. Access the fastest models with a single API key.

Get API Key

Custom Deployments

Dedicated infrastructure with SLAs, custom scaling, and guaranteed capacity for your workloads.

Contact Sales

Bring Your Own Model

Deploy any model on our optimized infrastructure. Same speed, your weights.

Learn More

The numbers GPU clouds can't match.

0x*

Faster Inference

<0ms*

Time to First Token

0%

Uptime SLA

0+

Tokens per Second

*Performance varies by model and geography.

Switch in 30 seconds.
No GPU required.

OpenAI-compatible API. Change your base URL, swap your key, and you're running on ASIC infrastructure. Your existing code doesn't change.

View Docs
main.py
from openai import OpenAI

client = OpenAI(
    base_url="https://api.generalcompute.com",
    api_key="your-api-key",
)

response = client.chat.completions.create(
    model="gpt-oss-120b",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
$10 in free credit when you sign up

Stop paying the GPU tax.

Get your API key in seconds. OpenAI-compatible — just change your base URL. $10 free credit to see the difference yourself.

ModeHumanAgent
General Compute — World's Fastest AI Inference