Installation¶
This guide provides instructions for installing and running tpu-inference.
There are three ways to install tpu-inference:
Install using pip via uv¶
We recommend using uv (uv pip install) instead of standard pip as it improves installation speed.
-
Create a working directory:
-
Install
uvand set up a Python virtual environment: -
Use the following command to install vllm-tpu using
uvorpip:
Run with Docker¶
Include the --privileged, --net=host, and --shm-size=150gb options to enable TPU interaction and shared memory.
export DOCKER_URI=vllm/vllm-tpu:latest
sudo docker run -it --rm --name $USER-vllm --privileged --net=host \
-v /dev/shm:/dev/shm \
--shm-size 150gb \
-p 8000:8000 \
--entrypoint /bin/bash ${DOCKER_URI}
Install from source¶
For debugging or development purposes, you can install tpu-inference from source. tpu-inference is a plugin for vllm, so you need to install both from source.
-
Install system dependencies:
-
Clone the
vllmandtpu-inferencerepositories: -
Install
uvand set up a Python virtual environment: -
Install
vllmfrom source, targeting the TPU device:NOTE:
tpu-inferencerepo pinsvllmrevision invllm_lkg.versionfile, make sure to checkout proper revision beforehand. -
Install
tpu-inferencefrom source:
Verify Installation¶
To quickly verify that the installation was successful under any of the above methods and vllm-tpu is correctly configured:
python -c '
import jax
import vllm
import importlib.metadata
from vllm.platforms import current_platform
tpu_version = importlib.metadata.version("tpu_inference")
print(f"vllm version: {vllm.__version__}")
print(f"tpu_inference version: {tpu_version}")
print(f"vllm platform: {current_platform.get_device_name()}")
print(f"jax backends: {jax.devices()}")
'
# Expected output:
# vllm version: 0.x.x
# tpu_inference version: 0.x.x
# vllm platform: TPU V6E (or your specific TPU architecture)
# jax backends: [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), ...]