Recommended Features
Recommended Features¶
This table shows the features currently tested for accuracy and performance.
| Feature | CorrectnessTest | PerformanceTest |
|---|---|---|
| Chunked Prefill | ✅ | ✅ |
| DCN-based P/D disaggregation | unverified | ✅ |
| KV cache host offloading | unverified | unverified |
| LoRA_Torch | ✅ | ✅ |
| Multimodal Inputs | ✅ | ✅ |
| Out-of-tree model support | ✅ | ✅ |
| Prefix Caching | ✅ | ✅ |
| Single Program Multi Data | ✅ | ✅ |
| Single-Host-P-D-disaggregation | N/A | N/A |
| Speculative Decoding: Eagle3 | ✅ | ✅ |
| Speculative Decoding: Ngram | ✅ | ✅ |
| async scheduler | ✅ | ✅ |
| data_parallelism | ✅ | unverified |
| runai_model_streamer_loader | ✅ | N/A |
| sampling_params | ✅ | N/A |
| structured_decoding | ✅ | N/A |
Kernel Support¶
This table shows the current kernel support status.
| Feature | CorrectnessTest | PerformanceTest |
|---|---|---|
| Collective Communication Matmul | ✅ | unverified |
| MLA | unverified | unverified |
| MoE | unverified | unverified |
| Quantized Attention | unverified | unverified |
| Quantized KV Cache | unverified | unverified |
| Quantized Matmul | unverified | unverified |
| Ragged Paged Attention V3 | ✅ | ✅ |
Parallelism Support¶
This table shows the current parallelism support status.
| Feature | CorrectnessTest | PerformanceTest |
|---|---|---|
| CP | unverified | unverified |
| DP | ✅ | unverified |
| EP | ✅ | unverified |
| PP | ✅ | ✅ |
| SP | unverified | unverified |
| TP | ✅ | unverified |
Quantization Support¶
This table shows the current quantization support status.
| Feature | Recommended TPU Generations | CorrectnessTest | PerformanceTest |
|---|---|---|---|
| AWQ INT4 | v5, v6 | unverified | unverified |
| FP4 W4A16 | v7 | unverified | unverified |
| FP8 W8A8 | v7 | unverified | unverified |
| FP8 W8A16 | v7 | unverified | unverified |
| INT4 W4A16 | v5, v6 | unverified | unverified |
| INT8 W8A8 | v5, v6 | unverified | unverified |