Skip to main content

Run a GPU-accelerated application in a Docker container on a cloud server

Last update:

Docker containers can be used on cloud servers with GPUs to flexibly manage GPU-accelerated applications without needing to set up an additional environment.

A containerized environment will allow you to:

  • optimally consume resources—you can run multiple applications on one server that would require setting up different environments in another;
  • avoid issues with CUDA Toolkit versioning for your applications.

Selectel offers ready-to-use Docker images for running GPU-accelerated applications in containerized environments:

  • Ubuntu 24.04 LTS 64-bit GPU Driver 535 Docker;
  • Ubuntu 24.04 LTS 64-bit GPU Driver 580 Docker;
  • Ubuntu 22.04 LTS 64-bit GPU Driver 535 Docker;
  • Ubuntu 22.04 LTS 64-bit GPU Driver 580 Docker.

Requirements for the cloud server

The cloud server must have:

  • server configuration with a GPU;
  • the image from which the server is created, with preinstalled GPU drivers and Docker;
  • a network volume or local disk of the server larger than 40 GB.

Run a GPU-accelerated application in a Docker container on a server

  1. Run the pytorch-cuda sample in a Docker container.

  2. Create a custom Docker image with CUDA.

1. Run the pytorch-cuda sample in a Docker container

Run PyTorch inside a Docker container with GPU support.

  1. Open the CLI.

  2. Make sure the GPU on the server is working correctly:

    nvidia-smi

    The response will show a list of NVIDIA-SMI, driver, and CUDA versions compatible with the current driver version, but not installed in the system. For example:

    +-----------------------------------------------------------------------------------------+
    | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
    |-----------------------------------------+------------------------+----------------------+
    | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
    | | | MIG M. |
    |=========================================+========================+======================|
    | 0 Tesla T4 Off | 00000000:00:06.0 Off | 0 |
    | N/A 41C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |
    | | | N/A |
    +-----------------------------------------+------------------------+----------------------+

    +-----------------------------------------------------------------------------------------+
    | Processes: |
    | GPU GI CI PID Type Process name GPU Memory |
    | ID ID Usage |
    |=========================================================================================|
    | No running processes found |
    +-----------------------------------------------------------------------------------------+
  3. Run a container from the NVIDIA Container Registry container catalog:

    sudo docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:<pytorch_version>-py3 bash

    Specify <pytorch_version> — the PyTorch version.

  4. Make sure that the CUDA Toolkit is installed in the container and the GPU is available for calculations:

    import torch

    print("CUDA Available: ", torch.cuda.is_available())
    print("Number of GPUs: ", torch.cuda.device_count())

    Example output:

    CUDA Available: True
    Number of GPUs: 1
  5. Make sure that CUDA Runtime 12.1 is installed in the container, as it is required to run the current version of PyTorch:

    conda list | grep cud

    Example output:

    libcudnn9-cuda-12 9.1.1.17 0 nvidia
    cuda-cudart 12.1.105 0 nvidia
    cuda-cupti 12.1.105 0 nvidia
    cuda-libraries 12.1.0 0 nvidia
    cuda-nvrtc 12.1.105 0 nvidia
    cuda-nvtx 12.1.105 0 nvidia
    cuda-opencl 12.3.101 0 nvidia
    cuda-runtime 12.1.0 0 nvidia

    You do not need to install CUDA Runtime on the server OS.

2. Create a custom Docker image with CUDA

  1. Run the ready-to-use container:

    docker run --gpus all -it --rm nvcr.io/nvidia/cuda:12.8.1-cudnn-devel-ubuntu24.04

    The container will have compatible versions of CUDA Toolkit, CUDA Runtime, and libcudnn preinstalled:

    cuda-cudart-12-8 12.8.90-1 amd64 CUDA Runtime native Libraries
    cuda-nvcc-12-8 12.8.93-1 amd64 CUDA nvcc
    cuda-toolkit-config-common 12.8.90-1 all Common config package for CUDA Toolkit.
    libcudnn9-cuda-12 9.8.0.87-1 amd64 cuDNN runtime libraries for CUDA 12.8
  2. Install Python 3:

    apt update && apt -y install python3 python3-pip
    python3 -m pip config set global.break-system-packages true
    python3 -m pip install tensorflow
  3. Make sure that the GPU is available in the Docker container:

    python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000]))); gpu_available = tf.test.is_gpu_available(); print('GPU is availlable: ', gpu_available)"

    Example output:

    I0000 00:00:1743408862.613883 910 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 4287 MB memory: -> device: 0, name: NVIDIA RTX A2000, pci bus id: 0000:00:06.0, compute capability: 8.6
    tf.Tensor(-1418.5072, shape=(), dtype=float32)
    Available GPUs: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
  4. Exit the shell without stopping the container: press Ctrl + P, and then Ctrl + Q.

  5. Check that the container is running:

    docker ps a

    Example output:

    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    20d557a37bdd nvcr.io/nvidia/cuda:12.8.1-cudnn-devel-ubuntu24.04 "/opt/nvidia/nvidia_…" 24 minutes ago Up 24 minutes nifty_shtern

    In the CONTAINER ID column, copy the ID of the container you ran in step 1.

  6. Create the image:

    docker commit <container_id> <image_tag>

    Specify:

    • <container_id> — the container ID you copied in step 5;
    • <image_tag> — the image tag.

    If the image was created, the image hash will be displayed. Example output:

    sha256:a7ff970295e5dd37ef441fcf0462752715c95cece2729ddcc774a8aaa0773bce
  7. Create and run a custom container from the image:

    docker run --rm -it <image_tag> bash

    Specify <image_tag> — the image tag you created in step 6.

    Here --rm is a flag that will remove the container after you exit the container's bash shell.