All roles are based in San Francisco, on-site.
Own the devops and hardware side of our inference stack: GPU / ASIC load balancing, model placement across racks based on live utilization, and end-to-end latency.
Build the serving runtime on top of our ASIC hardware: batching, KV cache, scheduling, and the OpenAI-compatible API surface.