Recommended Features

Recommended Features¶

This table shows the features currently tested for accuracy and performance.

Feature	CorrectnessTest	PerformanceTest
Chunked Prefill	✅	✅
DCN-based P/D disaggregation	unverified	✅
KV cache host offloading	unverified	unverified
LoRA_Torch	✅	✅
Multimodal Inputs	✅	✅
Out-of-tree model support	✅	✅
Prefix Caching	✅	✅
Single Program Multi Data	✅	✅
Single-Host-P-D-disaggregation	N/A	N/A
Speculative Decoding: Eagle3	✅	✅
Speculative Decoding: Ngram	✅	✅
async scheduler	✅	✅
data_parallelism	✅	unverified
runai_model_streamer_loader	✅	N/A
sampling_params	✅	N/A
structured_decoding	✅	N/A

This table shows the current kernel support status.

Feature	CorrectnessTest	PerformanceTest
Collective Communication Matmul	✅	unverified
MLA	unverified	unverified
MoE	unverified	unverified
Quantized Attention	unverified	unverified
Quantized KV Cache	unverified	unverified
Quantized Matmul	unverified	unverified
Ragged Paged Attention V3	✅	✅

This table shows the current parallelism support status.

This table shows the current quantization support status.

Feature	Recommended TPU Generations	CorrectnessTest	PerformanceTest
AWQ INT4	v5, v6	unverified	unverified
FP4 W4A16	v7	unverified	unverified
FP8 W8A8	v7	unverified	unverified
FP8 W8A16	v7	unverified	unverified
INT4 W4A16	v5, v6	unverified	unverified
INT8 W8A8	v5, v6	unverified	unverified