Skip to main content

Create a Managed Kubernetes cluster on a cloud server with GPUs

Last update:

You can add GPUs (graphics processing units) to a Managed Kubernetes cluster on a cloud server - when you create a Managed Kubernetes cluster on a cloud server or add a group of nodes on a cloud server.

To see GPU availability across regions, see the GPU availability matrix for Managed Kubernetes.

Nodes with GPUs can use pre-installed drivers or install drivers themselves. Automatic cluster scaling is not available for groups of nodes with GPUs without drivers.

Create a cluster on a cloud server with GPUs

Use the instructions Create a Managed Kubernetes cluster on a cloud server.

Select:

  • configuration - fixed configuration of a group of nodes with GPU;
  • GPU drivers - by default, the **GPU Dri **vers toggle switch is enabled and the cluster uses pre-installed drivers. To install GPU drivers yourself, turn off the GPU Drivers toggle switch.

Available GPUs

MemoryCUDA kernelsTensor kernels

NVIDIA® A100 40Gb

40 GB.
HBM2

6192432
NVIDIA® A100 80Gb80 GB.
HBM2
6912432
NVIDIA® Tesla T416 GB
GDDR6
2560320
NVIDIA® A3024 GB.
HBM2
3804224
NVIDIA® A2
(updated analog
NVIDIA® Tesla T4)
16 GB
GDDR6
128040
NVIDIA® GTX 10808 GB.
GDDR5X
2560
NVIDIA® RTX 2080 Ti11 GB.
GDDR6
4352544
NVIDIA® RTX 409024 GB.
GDDR6X
16384512
NVIDIA® RTX 6000 Ada
(analog L40)
48 GB.
GDDR6X
18176568
NVIDIA® A2000
(RTX 3060 analog)
6 GB.
GDDR6
3328104
NVIDIA® A5000
(RTX 3080 analog)
24 GB.
GDDR6
8192256
NVIDIA® H10080 GB.
HBM3
16896528
NVIDIA® H200141 GB.
HBM3e
16896528
NVIDIA® L424 GB.
GDDR6
20480640

You can view the current GPU list in the control panel: in the top menu, click ProductsManaged KubernetesCreate ClusterNode Group stage → Cloud ServerNode ConfigurationFixed with GPU.

To see GPU availability across regions, see the GPU availability matrix for Managed Kubernetes.

NVIDIA® A100 40Gb

Offers maximum performance for AI, HPC and data processing. Suitable for deep learning, scientific research and data analytics.

Based on Ampere® architecture, up to 1.5 GB/s throughput. See NVIDIA® documentation for detailed specifications.

In fixed Managed Kubernetes cluster configurations, 1 to 8 GPUs × 40 GB are available, with vCPUs from 6 to 48, RAM from 87 to 704 GB.

NVIDIA® A100 80Gb

Offers maximum performance for AI, HPC, and data processing, as well as large memory capacity for compute-intensive tasks. Suitable for deep learning, scientific research, and data analytics.

Based on Ampere® architecture, with 80GB HBM2 memory and up to 1.5GB/s bandwidth. See detailed specifications in NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 8 GPUs × 80 GB are available, with vCPUs from 12 to 192, RAM from 128 to 1,000 GB.

NVIDIA® Tesla T4

Suitable for Machine Learning and Deep Learning, inference, graphics and video rendering. Works with most AI frameworks and is compatible with all types of neural networks.

Based on Turing® architecture, up to 300 GB/s throughput. See NVIDIA® documentation for detailed specifications.

In fixed Managed Kubernetes cluster configurations, 1 to 4 GPUs × 16 GB are available, with vCPUs from 4 to 24, RAM from 32 to 320 GB.

NVIDIA® A30

Suitable for AI-inference, HPC, language processing, conversational artificial intelligence, recommender systems.

Based on Ampere® architecture, up to 933 GB/s throughput. See NVIDIA® documentation for detailed specifications.

In fixed Managed Kubernetes cluster configurations, 1 to 2 GPUs × 24 GB are available, with vCPUs from 16 to 48, RAM from 64 to 320 GB.

NVIDIA® A2

An entry-level GPU. Suitable for simple inference, video and graphics, Edge AI (edge computing), Edge video, mobile cloud gaming.

Based on Ampere® architecture, up to 200 GB/s throughput. See NVIDIA® documentation for detailed specifications.

In fixed Managed Kubernetes cluster configurations, 1 to 4 GPUs × 16 GB are available, with vCPUs from 12 to 48, RAM from 32 to 320 GB.

NVIDIA® GTX 1080

High-performance and energy-efficient GPU. The solution is realized with FinFET technology and GDDR5X memory. Dynamic load balancing helps to divide tasks so resources don't sit idle waiting. Maximizes performance for display, VR, ultra high-resolution settings, and data processing.

Based on Pascal® architecture, up to 320 GB/s throughput. See NVIDIA® documentation for detailed specifications.

In fixed Managed Kubernetes cluster configurations, 1 to 8 GPUs × 8 GB are available, with vCPUs from 8 to 28, RAM from 24 to 96 GB.

NVIDIA® RTX 2080 Ti

High-performance GPU for demanding graphics tasks. Suitable for high-resolution video processing, 3D modeling, rendering, and photo processing. Also suitable for training neural networks, performing complex AI computations, and processing large amounts of data.

Based on Turing® architecture, up to 616 GB/s throughput. See NVIDIA® documentation for detailed specifications.

In fixed Managed Kubernetes cluster configurations, 1 to 4 GPUs × 11 GB are available, with vCPUs from 2 to 48, RAM from 32 to 320 GB.

NVIDIA® RTX 4090

The highest performing GPU in the GeForce series. Suitable for professional design and 3D modeling, video, rendering, ML tasks (model learning and inference), language modeling (LLM), scientific and engineering computing (e.g., climate modeling or bioinformatics).

Based on Ada Lovelace® architecture, up to 1008 GB/s throughput. See NVIDIA® documentation for detailed specifications.

In fixed Managed Kubernetes cluster configurations, 1 to 4 GPUs × 24 GB are available, with vCPUs from 4 to 64, RAM from 16 to 356 GB.

NVIDIA® RTX 6000 Ada

Professional GPU for computing and graphics power. Suitable for ML tasks, rendering, scientific computing and high-performance visualization.

Based on Ada Lovelace® architecture, with 48GB GDDR6X memory and up to 960GB/s bandwidth. See detailed specifications in NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 4 GPUs × 48 GB are available, with vCPUs from 12 to 96, RAM from 64 to 450 GB.

NVIDIA® A2000

Power-efficient GPU for compact workstations. Suitable for AI, graphics and video rendering.

Based on Ampere® architecture, up to 288 GB/s bandwidth. See NVIDIA® documentation for detailed specifications.

In fixed Managed Kubernetes cluster configurations, 1 to 4 GPUs × 6 GB are available, with vCPUs from 6 to 24, RAM from 16 to 320 GB.

NVIDIA® A5000

A versatile GPU, suitable for any task within its performance limits.

Based on Ampere® architecture, up to 768 GB/s throughput. See NVIDIA® documentation for detailed specifications.

In fixed Managed Kubernetes cluster configurations, 1 to 2 GPUs × 24 GB are available, with vCPUs from 8 to 48, RAM from 32 to 320 GB.

NVIDIA® H100

A powerful GPU that is suitable for AI, HPC and scalable computing.

Based on the Hopper™ architecture, with 80GB HBM3 memory and up to 3TB/s of bandwidth. See detailed specifications in NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 2 GPUs × 80 GB are available, with vCPUs from 12 to 48, RAM from 128 to 256 GB.

NVIDIA® H200

Professional GPU for accelerating generative AI, HPC, large language model (LLM) inference, model file-tuning, image and video generation.

Based on Hopper™ architecture, with 141GB HBM3 memory and up to 4.8 TB/s of bandwidth. See detailed specifications in NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 8 GPUs × 141 GB are available, with vCPUs from 12 to 192, RAM from 120 GB to 1 TB.

NVIDIA® L4

Versatile GPU for accelerating AI/ML workloads, video processing, streaming and VDI. Suitable for running modern language models (LLM) and multimodal models.

Based on Ada Lovelace® architecture, with 24GB GDDR6 memory and up to 3TB/s bandwidth. See detailed specifications in NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 8 GPUs × 24 GB are available, with vCPUs from 8 to 128, RAM from 32 GB to 512 GB.