Infrastructure / Leadership

Head of Infrastructure

San Francisco, CA · On-site·Full-time

Own the hardware and devops side of our inference stack. You'll set infrastructure strategy, run GPU / ASIC load balancing and model placement, and keep end-to-end latency as low as physics allows.

Responsibilities

Lead infrastructure strategy for our inference fleet, from rack layout and power to the load balancer that fronts it.
Own GPU / ASIC load balancing and model placement across racks, driven by live utilization and tail latency.
Drive end-to-end inference latency down across the full client-to-token path.
Own the physical layer: rack density, power, cooling, cabling, and top-of-rack fabric.
Lead devops and SRE: observability, deployment, oncall, and incident response for the production fleet.
Partner with ASIC vendors and firmware teams on bringup, drivers, and hardware qualification.
Hire and grow the infrastructure team.

What we're looking for

7+ years in infrastructure, SRE, or platform engineering at scale.
Deep experience operating large GPU or accelerator fleets in production.
Hands-on expertise with load balancing and scheduling that target utilization and tail latency, not just request rate.
Strong grasp of rack-level topology (fabric, PCIe, NUMA, top-of-rack networking) and how it shows up in latency.
Comfortable at the hardware boundary: firmware, drivers, thermals, and power distribution.
Track record leading engineering teams and owning production oncall.

Nice to have

Experience with custom inference ASICs, TPUs, or non-NVIDIA accelerators.
Background in large-scale model serving (vLLM, TGI, TensorRT-LLM, or custom runtimes).
Network fabric design at data center scale (RoCE, InfiniBand).

Apply for this role

See other roles

ModeHuman Agent