Prepare the host
Add the non-free repository
Debian 12, on which Proxmox 8 is based, does not include proprietary drivers in its main
or contrib
repositories, so we'll add the non-free
and non-free-firmware
repository to the host:
sudo sed -i 's/contrib$/contrib non-free non-free-firmware/' /etc/apt/sources.list
Update the host
Before we begin, we'll make sure our system is up to date:
sudo apt-get update
sudo apt-get dist-upgrade -y
Reboot (optional)
If proxmox-kernel-*
was updated, we should reboot our Proxmox host:
sudo reboot
Install the Tesla 470 driver
Install NVIDIA Tesla 470 driver:
sudo apt-get install -y nvidia-tesla-470-driver
Install compatible CUDA Toolkit
To install a compatible version of the CUDA Toolkit, we must download the run file for CUDA Toolkit 11.4 from NVIDIA and execute it:
wget https://developer.download.nvidia.com/compute/cuda/11.4.0/local_installers/cuda_11.4.0_470.42.01_linux.run
sudo sh cuda_11.4.0_470.42.01_linux.run --override --toolkit --silent
Installer flags explanation
- The
--override
flag overrides the GCC version requirement - The
--silent
flag automatically accepts the EULA - The
--toolkit
flag tells the silent installer to install the CUDA Toolkit
Enable persistence mode
Persistence mode is what allows us to modify the power limit and such. The driver comes with the nvidia-persistenced.service, but that doesn't work for some reason or another, so we will disable it and mask it.
sudo systemctl disable --now nvidia-persistenced.service
sudo systemctl mask nvidia-persistenced.service
Until next boot
sudo nvidia-smi -pm 1
Persist after boot
Because persistence mode doesn't survive reboot, we need a systemd unit file to enable it at boot. We're going to create a template that will allow us to enable or disable persistence on specific GPUs in multi-GPU systems. The following command line will create a new systemd service template file:
sudo systemctl edit --force --full [email protected]
Add the following to the template service file:
[Unit]
Description=Set NVIDIA Tesla K20 persistence on GPU %i
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/nvidia-smi --id=%i --persistence-mode=1
ExecStop=/usr/bin/nvidia-smi --id=%i --persistence-mode=0
[Install]
WantedBy=multi-user.target
We'll use the following command to enable persistence on our target GPU, remembering to replace 0
with our GPU id in nvidia-smi
:
sudo systemctl enable --now [email protected]
This will allow us to enable and disable persistence like any other service.
Low-power mode on boot
sudo systemctl edit --force --full [email protected]
Add the following to the template service file:
[Unit]
Description=Set NVIDIA Tesla K20 low-power mode on GPU %i
After=nvidia-tesla-k20-persistence@%i.service
Requires=nvidia-tesla-k20-persistence@%i.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/nvidia-smi --id=%i --power-limit=150
ExecStop=/usr/bin/nvidia-smi --id=%i --power-limit=235
[Install]
WantedBy=multi-user.target
We'll use the following command to enable low-power mode on our target GPU at boot, remembering to replace 0
with our GPU id in nvidia-smi
:
sudo systemctl enable --now [email protected]
This will allow us to enable and disable low-power mode like any other service.