Recommended Model and Feature Matrices¶
Although vLLM TPU’s new unified backend makes out-of-the-box high performance serving possible with any model supported in vLLM, the reality is that we're still in the process of implementing a few core components. For this reason, until we land more capabilities, we recommend starting from this list of stress tested models and features below.
We are still landing components in tpu-inference that will improve performance for larger scale, higher complexity models (XL MoE, +vision encoders, MLA, etc.).
If you’d like us to prioritize something specific, please submit a GitHub feature request here.
Recommended Models¶
These tables show the models currently tested for accuracy and performance.
Models¶
| Model | Type | UnitTest | Accuracy/Correctness | Benchmark |
|---|---|---|---|---|
| Qwen/Qwen2.5-VL-7B-Instruct | Multimodal | ✅ | ✅ | ✅ |
| Qwen/Qwen3-Omni-30B-A3B-Instruct | Multimodal | unverified | unverified | unverified |
| meta-llama/Llama-4-Maverick-17B-128E-Instruct | Multimodal | unverified | unverified | unverified |
| Qwen/Qwen3-30B-A3B | Text | ✅ | ✅ | ✅ |
| Qwen/Qwen3-32B | Text | ✅ | ✅ | ✅ |
| Qwen/Qwen3-4B | Text | ✅ | ✅ | ✅ |
| Qwen/Qwen3-Coder-480B-A35B-Instruct | Text | unverified | unverified | unverified |
| deepseek-ai/DeepSeek-V3.1 | Text | unverified | unverified | unverified |
| google/gemma-3-27b-it | Text | ✅ | ✅ | ✅ |
| meta-llama/Llama-3.1-8B-Instruct | Text | ✅ | ✅ | ✅ |
| meta-llama/Llama-3.3-70B-Instruct | Text | ✅ | ✅ | ✅ |
| meta-llama/Llama-Guard-4-12B | Text | ✅ | ✅ | ✅ |
| moonshotai/Kimi-K2-Thinking | Text | unverified | unverified | unverified |
| openai/gpt-oss-120b | Text | unverified | unverified | unverified |