Install NVIDIA Tesla K20x drivers & CUDA Toolkit on Proxmox 8.3

18 February 2025

Prepare the host

Add the non-free repository

Debian 12, on which Proxmox 8 is based, does not include proprietary drivers in its main or contrib repositories, so we'll add the non-free and non-free-firmware repository to the host:

sudo sed -i 's/contrib$/contrib non-free non-free-firmware/' /etc/apt/sources.list

Update the host

Before we begin, we'll make sure our system is up to date:

sudo apt-get update
sudo apt-get dist-upgrade -y

Reboot (optional)

If proxmox-kernel-* was updated, we should reboot our Proxmox host:

sudo reboot

Install the Tesla 470 driver

Install NVIDIA Tesla 470 driver:

sudo apt-get install -y nvidia-tesla-470-driver

Install compatible CUDA Toolkit

To install a compatible version of the CUDA Toolkit, we must download the run file for CUDA Toolkit 11.4 from NVIDIA and execute it:

wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda_11.4.0_470.42.01_linux.run
sudo sh cuda_11.4.0_470.42.01_linux.run --override --toolkit --silent

Installer flags explanation

  • The --override flag overrides the GCC version requirement
  • The --silent flag automatically accepts the EULA
  • The --toolkit flag tells the silent installer to install the CUDA Toolkit

Enable persistence mode

Persistence mode is what allows us to modify the power limit and such. The driver comes with the nvidia-persistenced.service, but that doesn't work for some reason or another, so we will disable it and mask it.

sudo systemctl disable --now nvidia-persistenced.service
sudo systemctl mask nvidia-persistenced.service

Until next boot

sudo nvidia-smi -pm 1

Persist after boot

Because persistence mode doesn't survive reboot, we need a systemd unit file to enable it at boot. We're going to create a template that will allow us to enable or disable persistence on specific GPUs in multi-GPU systems. The following command line will create a new systemd service template file:

sudo systemctl edit --force --full [email protected]

Add the following to the template service file:

[Unit]
Description=Set NVIDIA Tesla K20 persistence on GPU %i

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/nvidia-smi --id=%i --persistence-mode=1
ExecStop=/usr/bin/nvidia-smi --id=%i --persistence-mode=0

[Install]
WantedBy=multi-user.target

We'll use the following command to enable persistence on our target GPU, remembering to replace 0 with our GPU id in nvidia-smi:

sudo systemctl enable --now [email protected]

This will allow us to enable and disable persistence like any other service.

Low-power mode on boot

sudo systemctl edit --force --full [email protected]

Add the following to the template service file:

[Unit]
Description=Set NVIDIA Tesla K20 low-power mode on GPU %i
After=nvidia-tesla-k20-persistence@%i.service
Requires=nvidia-tesla-k20-persistence@%i.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/nvidia-smi --id=%i --power-limit=150
ExecStop=/usr/bin/nvidia-smi --id=%i --power-limit=235

[Install]
WantedBy=multi-user.target

We'll use the following command to enable low-power mode on our target GPU at boot, remembering to replace 0 with our GPU id in nvidia-smi:

sudo systemctl enable --now [email protected]

This will allow us to enable and disable low-power mode like any other service.