Real-world LLM inference performance benchmarks comparing General Compute against leading providers.
Mean TTFT (ms)
* Includes network overhead for all providers
Mean Latency (s)
* Includes network overhead for all providers
Our benchmarks are designed to reflect real-world LLM inference performance across different workload types. All tests are conducted using GPT-OSS-120B with identical prompts sent to both General Compute and competing providers simultaneously.
We test two workload categories: Short (50 tokens) and Long (1,000 tokens). For each request, we measure Time to First Token (TTFT) and end-to-end latency, allowing us to calculate pure generation rate by excluding TTFT overhead. This provides a clear view of both responsiveness and sustained throughput performance.
Pure generation rate vs. generation latency (excludes TTFT overhead)
TTFT distribution by provider and workload type