Install NVIDIA Tesla K20x drivers & CUDA Toolkit on Proxmox 8.3

18 February 2025

Prepare the host

Add the non-free repository

Debian 12, on which Proxmox 8 is based, does not include proprietary drivers in its main or contrib repositories, so we'll add the non-free and non-free-firmware repository to the host:

sudo sed -i 's/contrib$/contrib non-free non-free-firmware/' /etc/apt/sources.list

Update the host

Before we begin, we'll make sure our system is up to date:

sudo apt-get update
sudo apt-get dist-upgrade -y

Reboot (optional)

If proxmox-kernel-* was updated, we should reboot our Proxmox host:

sudo reboot

Install the Tesla 470 driver

Install NVIDIA Tesla 470 driver:

sudo apt-get install -y nvidia-tesla-470-driver

Install compatible CUDA Toolkit

To install a compatible version of the CUDA Toolkit, we must download the run file for CUDA Toolkit 11.4 from NVIDIA and execute it:

wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda_11.4.0_470.42.01_linux.run
sudo sh cuda_11.4.0_470.42.01_linux.run --override --toolkit --silent

Installer flags explanation

The --override flag overrides the GCC version requirement
The --silent flag automatically accepts the EULA
The --toolkit flag tells the silent installer to install the CUDA Toolkit

Enable persistence mode

Persistence mode is what allows us to modify the power limit and such. The driver comes with the nvidia-persistenced.service, but that doesn't work for some reason or another, so we will disable it and mask it.

sudo systemctl disable --now nvidia-persistenced.service
sudo systemctl mask nvidia-persistenced.service

Until next boot

sudo nvidia-smi -pm 1

Persist after boot

Because persistence mode doesn't survive reboot, we need a systemd unit file to enable it at boot. We're going to create a template that will allow us to enable or disable persistence on specific GPUs in multi-GPU systems. The following command line will create a new systemd service template file:

sudo systemctl edit --force --full [email protected]

Add the following to the template service file:

[Unit]
Description=Set NVIDIA Tesla K20 persistence on GPU %i

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/nvidia-smi --id=%i --persistence-mode=1
ExecStop=/usr/bin/nvidia-smi --id=%i --persistence-mode=0

[Install]
WantedBy=multi-user.target

We'll use the following command to enable persistence on our target GPU, remembering to replace 0 with our GPU id in nvidia-smi:

sudo systemctl enable --now [email protected]

This will allow us to enable and disable persistence like any other service.

Low-power mode on boot

sudo systemctl edit --force --full [email protected]

Add the following to the template service file:

[Unit]
Description=Set NVIDIA Tesla K20 low-power mode on GPU %i
After=nvidia-tesla-k20-persistence@%i.service
Requires=nvidia-tesla-k20-persistence@%i.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/nvidia-smi --id=%i --power-limit=150
ExecStop=/usr/bin/nvidia-smi --id=%i --power-limit=235

[Install]
WantedBy=multi-user.target

We'll use the following command to enable low-power mode on our target GPU at boot, remembering to replace 0 with our GPU id in nvidia-smi:

sudo systemctl enable --now [email protected]

This will allow us to enable and disable low-power mode like any other service.

Bibliography

CUDA Toolkit 11.4 Downloads | NVIDIA Developer

systemd - Arch Wiki

Configuring BTRFS mirror (RAID1) on Proxmox 8.3 root partition

31 March 2025

Proxmox 8.3 installer has support for BTRFS mirror as a "technical preview", but out of the box the Proxmox installer does not actually create a fault-tolerant installation. In this post we go over how to sync the bootloaders using proxmox-boot-tool

Enable hugepages on Proxmox 8 for XMRig container

12 February 2025

According to the XMRig documentation, enabling hugepages results in a typical hash rate increase of 20-30% for most algorithms with an up-to-50% increase in hash rate for RandomX algorithms. However, to run XMRig in an unprivileged container, we must reserve the pages on the Proxmox host ahead of time.

Fix missing Realtek NIC firmware in Proxmox 8.3 with kernel 6.11 on Lenovo ideacentre 300s

14 January 2025

After opting-in to the Linux kernel 6.11 on Proxmox 8.3 on a Lenovo ideacentre 300s, I got the error `w: possible missing firmware /lib/firmware/rtl_nic/rtl8126a-3.fw for module r8169`. You fix it by downloading the firmware directly from the Linux source tree and updating your initramfs.