Installation¶

This guide provides instructions for installing and running tpu-inference.

There are three ways to install tpu-inference:

Install using pip via uv
Run with Docker
Install from source

Install using pip via uv¶

We recommend using uv (uv pip install) instead of standard pip as it improves installation speed.

Create a working directory:
```
mkdir ~/work-dir
cd ~/work-dir
```

Install uv and set up a Python virtual environment:

# If you prefer standard pip, simply use `python3.12 -m venv vllm_env`
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
uv venv vllm_env --python 3.12
source vllm_env/bin/activate

Use the following command to install vllm-tpu using uv or pip:

uv pip install vllm-tpu
# Or instead: pip install vllm-tpu

Run with Docker¶

Include the --privileged, --net=host, and --shm-size=150gb options to enable TPU interaction and shared memory.

export DOCKER_URI=vllm/vllm-tpu:latest
sudo docker run -it --rm --name $USER-vllm --privileged --net=host \
    -v /dev/shm:/dev/shm \
    --shm-size 150gb \
    -p 8000:8000 \
    --entrypoint /bin/bash ${DOCKER_URI}

Install from source¶

For debugging or development purposes, you can install tpu-inference from source. tpu-inference is a plugin for vllm, so you need to install both from source.

Install system dependencies:

sudo apt-get update && sudo apt-get install -y libopenblas-base libopenmpi-dev libomp-dev

Clone the vllm and tpu-inference repositories:

git clone https://github.com/vllm-project/tpu-inference.git
export VLLM_COMMIT_HASH="$(cat tpu-inference/.buildkite/vllm_lkg.version)"
git clone https://github.com/vllm-project/vllm.git
cd vllm
git checkout "${VLLM_COMMIT_HASH}"
cd ..

Install uv and set up a Python virtual environment:

curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
uv venv vllm_env --python 3.12
source vllm_env/bin/activate

Install vllm from source, targeting the TPU device:

NOTE: tpu-inference repo pins vllm revision in vllm_lkg.version file, make sure to checkout proper revision beforehand.
```
cd vllm
uv pip install -r requirements/tpu.txt
VLLM_TARGET_DEVICE="tpu" uv pip install -e .
cd ..
```

Install tpu-inference from source:

cd tpu-inference
uv pip install -e .
cd ..

Verify Installation¶

To quickly verify that the installation was successful under any of the above methods and vllm-tpu is correctly configured:

python -c '
import jax
import vllm
import importlib.metadata
from vllm.platforms import current_platform

tpu_version = importlib.metadata.version("tpu_inference")
print(f"vllm version: {vllm.__version__}")
print(f"tpu_inference version: {tpu_version}")
print(f"vllm platform: {current_platform.get_device_name()}")
print(f"jax backends: {jax.devices()}")
'
# Expected output:
# vllm version: 0.x.x
# tpu_inference version: 0.x.x
# vllm platform: TPU V6E (or your specific TPU architecture)
# jax backends: [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), ...]