Install drivers on the cloud server with the GPU
These are instructions with an example of installing drivers on a cloud server that is created from a pre-built Ubuntu 24.04 LTS 64-bit image.
You must install drivers for NVIDIA® GPUs on a cloud server with the GPU for stable operation.
If you have created a cloud server from a pre-built GPU-optimized image, the drivers are already installed, no additional installation is required. GPU-optimized pre-built images:
- Ubuntu 24.04 LTS 64-bit GPU driver;
- Ubuntu 24.04 LTS 64-bit CUDA 11.8 Docker;
- Ubuntu 24.04 LTS 64-bit CUDA 12.8 Docker;
- Ubuntu 22.04 LTS 64-bit GPU driver;
- Ubuntu 22.04 LTS 64-bit CUDA 11.8 Docker;
- Ubuntu 22.04 LTS 64-bit CUDA 12.8 Docker;
- Data Science VM (Ubuntu 22.04 LTS 64-bit);
- Data Analytics VM (Ubuntu 22.04 LTS 64-bit).
Install drivers
-
Install the
ubuntu-drivers-common
package:sudo apt install -y ubuntu-drivers-common alsa-utils
-
Check out the recommended driver version:
sudo ubuntu-drivers devices
A list of versions will appear in the response. The recommended version will be marked as
recommended
. Copy the recommended version.Example for NVIDIA® Tesla T4 GPU with recommended version
nvidia-driver-550
:== /sys/devices/pci0000:00/0000:00:06.0 ==
modalias : pci:v000010DEd00001EB8sv000010DEsd000012A2bc03sc02i00
vendor : NVIDIA Corporation
model : TU104GL [Tesla T4]
manual_install: True
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-535-server - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-550 - third-party non-free recommended
driver : nvidia-driver-418-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin -
Optional: verify that the selected driver version is higher than the minimum compatible version for the cloud server GPU architecture:
sudo apt-cache search nvidia-driver-*
A list of compatible driver versions will appear in the response. To see the GPU architecture, see the Create a Cloud Server with a GPU instructions, and to see if the driver version and architecture match, see the CUDA Compatibility instructions in the NVIDIA® CUDA Compatibility documentation.
-
If your GPU architecture is Pascal (such as the NVIDIA® GTX 1080), add the NVIDIA® Personal Package Archive repository to the cloud server:
sudo add-apt-repository ppa:graphics-drivers/ppa -y
-
Set the kernel headers:
sudo apt update
for kernel in $(linux-version list); do apt install -y "linux-headers-<kernel-version>"; doneSpecify
<kernel-version>
— kernel version. The list of kernel versions can be viewed with the commandapt-cache search linux-image
. -
Install the driver:
sudo apt install -y <driver_version>
Specify
<
driver_version>
is the driver version you copied in step 3.Example of installing the recommended version of
nvidia-driver-550
for NVIDIA® Tesla T4 GPUs:sudo apt install -y nvidia-driver-550
-
Check that the driver is installed and working:
nvidia-smi
The response will show NVIDIA-SMI versions, driver versions, and a CUDA version that is compatible with the current driver version but is not installed on the system. The CUDA Runtime API and CUDA Toolkit are installed separately and are not included in the
nvidia-driver
package. Example answer:+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 Off | 00000000:00:06.0 Off | 0 |
| N/A 41C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+ -
Open the configuration file of the
unattended-upgrades
package that handles security updates:nano /etc/apt/apt.conf.d/50unattended-upgrades
-
Disable package updates for NVIDIA®. To do this, add a block to the file:
Unattended-Upgrade::Package-Blacklist {
"linux-";
"nvidia-";
}; -
Exit the
nano
text editor with your changes saved: press Ctrl+X and then Y+Enter. -
Optional: lock the kernel version to disable kernel update. Updating the kernel version may cause errors in GPU drivers.
Commit kernel version
In the ready images with pre-installed drivers, except for Data Analytics VM (Ubuntu 22.04 LTS 64-bit) and Data Science VM (Ubuntu 22.04 LTS 64-bit), the kernel version is already fixed.
Drivers are compiled with the source code headers of the current kernel version during the installation process. Changing the kernel version will cause the GPU driver to fail. In this case, the following error may occur in the output of the nvidia-smi
command:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
To disable kernel updates, commit the kernel version in the apt
package manager settings. After committing, you will be able to update the kernel version.
-
Open the CLI.
-
Create a
pin-linux-kernel-nvidia-dkms
file in the/etc/apt/preferences.d
directory to commit the version of thelinux-headers
andlinux-image
packages:cat <<EOF > /etc/apt/preferences.d/pin-linux-kernel-nvidia-dkms
Package: linux-image-*
Pin: version *
Pin-Priority: -1
Package: linux-headers-*
pin: version *
Pin-Priority: -1
EOF
Update the kernel version after committing
Once you commit a kernel version, you cannot update it. To download security updates, performance improvements, and add new features, delete the kernel version commit file and upgrade the version.
-
Open the CLI.
-
Delete the file you created to commit the kernel version:
rm /etc/apt/preferences.d/pin-linux-kernel-nvidia-dkms
-
Update the kernel version:
apt install linux-image-<kernel-version>
Specify
<kernel-version>
— kernel version. The list of kernel versions can be viewed with the commandapt-cache search linux-image
. -
Set the kernel headers:
apt install linux-headers-$(uname -r)
Once the kernel headers are installed, the
dkms
utility will run and automatically rebuild the NVIDIA modules for the new kernel version.