Skip to content

Recommended Features

This table shows the features currently tested for accuracy and performance.

Feature CorrectnessTest PerformanceTest
Chunked Prefill
DCN-based P/D disaggregation unverified
KV cache host offloading unverified unverified
LoRA_Torch
Multimodal Inputs
Out-of-tree model support
Prefix Caching
Single Program Multi Data
Single-Host-P-D-disaggregation N/A N/A
Speculative Decoding: Eagle3
Speculative Decoding: Ngram
async scheduler
data_parallelism unverified
runai_model_streamer_loader N/A
sampling_params N/A
structured_decoding N/A

Kernel Support

This table shows the current kernel support status.

Feature CorrectnessTest PerformanceTest
Collective Communication Matmul unverified
MLA unverified unverified
MoE unverified unverified
Quantized Attention unverified unverified
Quantized KV Cache unverified unverified
Quantized Matmul unverified unverified
Ragged Paged Attention V3

Parallelism Support

This table shows the current parallelism support status.

Feature CorrectnessTest PerformanceTest
CP unverified unverified
DP unverified
EP unverified
PP
SP unverified unverified
TP unverified

Quantization Support

This table shows the current quantization support status.

Feature Recommended TPU Generations CorrectnessTest PerformanceTest
AWQ INT4 v5, v6 unverified unverified
FP4 W4A16 v7 unverified unverified
FP8 W8A8 v7 unverified unverified
FP8 W8A16 v7 unverified unverified
INT4 W4A16 v5, v6 unverified unverified
INT8 W8A8 v5, v6 unverified unverified