Agent Readout
Coding agent workload profile
Numbers relevant if you orchestrate autonomous coding loops.
- Median completion
- 8.6s / request
- Cost delta
- 60% cheaper vs hyperscalers
Observed behavior
- 7x faster than Together AI on 1k-token multi-step patches.
- Input cost: $0.45 / 1M tokens, output $0.60 / 1M tokens.
- Use streaming to keep developers updated while the agent plans.
Recommended settings
- Set
temperature=0.2,max_tokens=900,stream=true. - Retry with jittered exponential backoff if HTTP 529 (rare, occurs at >500 RPS).