Agent Readout

Benchmark summary

Benchmarking real-world LLM inference benchmarks. Model: GPT-OSS-120B. All measurements include network overhead.

Mean TTFT: 738ms (2.6x faster than Together AI)
Mean E2E Latency: 1.76s (4.6x faster than Together AI)

Head-to-head vs Together AI

Time to First Token: General Compute 738ms vs Together AI 1899ms (2.6x faster).
End-to-End Latency: General Compute 1.76s vs Together AI 8.05s (4.6x faster).

Methodology

Model: GPT-OSS-120B with identical prompts sent to all providers simultaneously.
Workload categories: Short (50 tokens) and Long (1,000 tokens).
Metrics: TTFT, end-to-end latency, and pure generation rate (excluding TTFT overhead).
All measurements include network overhead for all providers.

Links

Full interactive benchmarks page

ModeHuman Agent

Benchmarks | General Compute