Real Benchmarks

ProviderMAX

Real-world LLM inference performance benchmarks comparing General Compute against leading providers.

Time to First Token

Mean TTFT (ms)

General Compute

0ms

Together AI

0ms

* Includes network overhead for all providers

End-to-End Latency

Mean Latency (s)

General Compute

0.00s

Together AI

0.00s

* Includes network overhead for all providers

Methodology

Our benchmarks are designed to reflect real-world LLM inference performance across different workload types. All tests are conducted using GPT-OSS-120B with identical prompts sent to both General Compute and competing providers simultaneously.

We test two workload categories: Short (50 tokens) and Long (1,000 tokens). For each request, we measure Time to First Token (TTFT) and end-to-end latency, allowing us to calculate pure generation rate by excluding TTFT overhead. This provides a clear view of both responsiveness and sustained throughput performance.

Workload Filter

Throughput Analysis

Pure generation rate vs. generation latency (excludes TTFT overhead)

Time to First Token (TTFT) Analysis

TTFT distribution by provider and workload type