We raised $15M to build the world's fastest neocloud.Read
Infrastructure / Leadership

Head of Infrastructure

New York, NY · On-site·Full-time

Own the infrastructure layer of our inference cloud end-to-end. Today that means the control plane, the gateway in front of our ASIC fleet, and the observability stack that tells us where every millisecond goes. Over the next 6-8 months it grows into a heterogeneous fleet: ASICs for decode, GPUs for prefill, and the physical-layer ownership that comes with it.

The first six months are hands-on: k8s manifests, dashboards, oncall, and a direct line to our ASIC partner's engineering team when production behaves strangely. The team grows under you from there.

Responsibilities

  • Own the inference control plane. Today it's built on configuration provided by our ASIC partner; you'll be the person who understands it deeply enough to modify, extend, and eventually replace pieces of it.
  • Own the gateway and load balancer that fronts the fleet. Model placement, request routing, and tail-latency engineering live here, driven by live utilization and per-model SLOs.
  • Own observability end-to-end. Per-request tracing from OpenRouter ingress through to the accelerator, with p50/p95/p99 dashboards, SLOs, and alerting that wakes the right person.
  • Run capacity planning against a real, distributed traffic mix across the open-weight models we serve.
  • Own the operational side of the ASIC partnership. Most weird production issues route through their engineering team until we build that expertise in-house, and you'll be our technical face in those conversations.
  • Bring up the prefill side of our disaggregated architecture on a second hardware platform as it comes online. Different vendor, different fabric, different kernels.
  • Build the oncall and incident response practice from zero. Hire and grow the team underneath you.

What we're looking for

  • 7+ years in infrastructure, SRE, or platform engineering, with at least some of it at a serious inference, ML, or HPC shop.
  • Hands-on with Kubernetes at production scale — not just deploying, but debugging the weird stuff.
  • Strong instincts for tail latency. You think about p99 and utilization as the same problem, not different ones.
  • Comfortable owning a vendor relationship where the vendor's bugs are now your production issues.
  • Track record of building observability practices that actually catch problems, not just generate dashboards.
  • Have been oncall through real incidents and can talk about what you learned.
  • Want to be the first infra hire at something early, not the tenth at something big.

Nice to have

  • Experience operating non-NVIDIA accelerators in production — TPUs, ASICs, or alternative GPU vendors.
  • Background with model-serving stacks (vLLM, TGI, TensorRT-LLM, SGLang).
  • Network fabric experience at data-center scale (RoCE, InfiniBand).
  • Have hired and managed an infra team before.
  • Comfort at the hardware boundary — firmware, drivers, thermals — for when the roadmap takes us there.
ModeHumanAgent
Head of Infrastructure