Agent Readout

Benchmark summary

ProviderMAX real-world LLM inference benchmarks. Model: GPT-OSS-120B. All measurements include network overhead.

Mean TTFT
738ms (2.6x faster than Together AI)
Mean E2E Latency
1.76s (4.6x faster than Together AI)

Head-to-head vs Together AI

  • Time to First Token: General Compute 738ms vs Together AI 1899ms (2.6x faster).
  • End-to-End Latency: General Compute 1.76s vs Together AI 8.05s (4.6x faster).

Methodology

  • Model: GPT-OSS-120B with identical prompts sent to all providers simultaneously.
  • Workload categories: Short (50 tokens) and Long (1,000 tokens).
  • Metrics: TTFT, end-to-end latency, and pure generation rate (excluding TTFT overhead).
  • All measurements include network overhead for all providers.

Links

ModeHumanAgent