Prepare the host
Add the non-free repository
Debian 12, on which Proxmox 8 is based, does not include proprietary drivers in its main or contrib repositories, so we'll add the non-free and non-free-firmware repository to the host:
sudo sed -i 's/contrib$/contrib non-free non-free-firmware/' /etc/apt/sources.listUpdate the host
Before we begin, we'll make sure our system is up to date:
sudo apt-get update
sudo apt-get dist-upgrade -yReboot (optional)
If proxmox-kernel-* was updated, we should reboot our Proxmox host:
sudo rebootInstall the Tesla 470 driver
Install NVIDIA Tesla 470 driver:
sudo apt-get install -y nvidia-tesla-470-driverInstall compatible CUDA Toolkit
To install a compatible version of the CUDA Toolkit, we must download the run file for CUDA Toolkit 11.4 from NVIDIA and execute it:
wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda_11.4.0_470.42.01_linux.run
sudo sh cuda_11.4.0_470.42.01_linux.run --override --toolkit --silentInstaller flags explanation
- The
--overrideflag overrides the GCC version requirement - The
--silentflag automatically accepts the EULA - The
--toolkitflag tells the silent installer to install the CUDA Toolkit
Enable persistence mode
Persistence mode is what allows us to modify the power limit and such. The driver comes with the nvidia-persistenced.service, but that doesn't work for some reason or another, so we will disable it and mask it.
sudo systemctl disable --now nvidia-persistenced.service
sudo systemctl mask nvidia-persistenced.serviceUntil next boot
sudo nvidia-smi -pm 1Persist after boot
Because persistence mode doesn't survive reboot, we need a systemd unit file to enable it at boot. We're going to create a template that will allow us to enable or disable persistence on specific GPUs in multi-GPU systems. The following command line will create a new systemd service template file:
sudo systemctl edit --force --full [email protected]Add the following to the template service file:
[Unit]
Description=Set NVIDIA Tesla K20 persistence on GPU %i
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/nvidia-smi --id=%i --persistence-mode=1
ExecStop=/usr/bin/nvidia-smi --id=%i --persistence-mode=0
[Install]
WantedBy=multi-user.targetWe'll use the following command to enable persistence on our target GPU, remembering to replace 0 with our GPU id in nvidia-smi:
sudo systemctl enable --now [email protected]This will allow us to enable and disable persistence like any other service.
Low-power mode on boot
sudo systemctl edit --force --full [email protected]Add the following to the template service file:
[Unit]
Description=Set NVIDIA Tesla K20 low-power mode on GPU %i
After=nvidia-tesla-k20-persistence@%i.service
Requires=nvidia-tesla-k20-persistence@%i.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/nvidia-smi --id=%i --power-limit=150
ExecStop=/usr/bin/nvidia-smi --id=%i --power-limit=235
[Install]
WantedBy=multi-user.targetWe'll use the following command to enable low-power mode on our target GPU at boot, remembering to replace 0 with our GPU id in nvidia-smi:
sudo systemctl enable --now [email protected]This will allow us to enable and disable low-power mode like any other service.