Build the inference runtime on top of our ASIC hardware: batching, KV cache, scheduling, and the OpenAI-compatible API surface.