Infrastructure / Leadership

Head of Infrastructure

San Francisco, CA · On-site·Full-time

Own the hardware and devops side of our inference stack. You'll set infrastructure strategy, run GPU / ASIC load balancing and model placement, and keep end-to-end latency as low as physics allows.

Responsibilities

  • Lead infrastructure strategy for our inference fleet, from rack layout and power to the load balancer that fronts it.
  • Own GPU / ASIC load balancing and model placement across racks, driven by live utilization and tail latency.
  • Drive end-to-end inference latency down across the full client-to-token path.
  • Own the physical layer: rack density, power, cooling, cabling, and top-of-rack fabric.
  • Lead devops and SRE: observability, deployment, oncall, and incident response for the production fleet.
  • Partner with ASIC vendors and firmware teams on bringup, drivers, and hardware qualification.
  • Hire and grow the infrastructure team.

What we're looking for

  • 7+ years in infrastructure, SRE, or platform engineering at scale.
  • Deep experience operating large GPU or accelerator fleets in production.
  • Hands-on expertise with load balancing and scheduling that target utilization and tail latency, not just request rate.
  • Strong grasp of rack-level topology (fabric, PCIe, NUMA, top-of-rack networking) and how it shows up in latency.
  • Comfortable at the hardware boundary: firmware, drivers, thermals, and power distribution.
  • Track record leading engineering teams and owning production oncall.

Nice to have

  • Experience with custom inference ASICs, TPUs, or non-NVIDIA accelerators.
  • Background in large-scale model serving (vLLM, TGI, TensorRT-LLM, or custom runtimes).
  • Network fabric design at data center scale (RoCE, InfiniBand).
ModeHumanAgent