Agent Readout
Benchmark summary
ProviderMAX real-world LLM inference benchmarks. Model: GPT-OSS-120B. All measurements include network overhead.
- Mean TTFT
- 738ms (2.6x faster than Together AI)
- Mean E2E Latency
- 1.76s (4.6x faster than Together AI)
Head-to-head vs Together AI
- Time to First Token: General Compute 738ms vs Together AI 1899ms (2.6x faster).
- End-to-End Latency: General Compute 1.76s vs Together AI 8.05s (4.6x faster).
Methodology
- Model: GPT-OSS-120B with identical prompts sent to all providers simultaneously.
- Workload categories: Short (50 tokens) and Long (1,000 tokens).
- Metrics: TTFT, end-to-end latency, and pure generation rate (excluding TTFT overhead).
- All measurements include network overhead for all providers.