Skip to main content

Create a Managed Kubernetes cluster on a cloud server with a GPU

Last update:

You can add GPUs (graphics processing units) to a Managed Kubernetes cluster on a cloud server when creating a Managed Kubernetes cluster on a cloud server or by adding a node group on a cloud server.

GPU availability in regions can be viewed in the GPU for Managed Kubernetes availability matrix.

On nodes with a GPU, you can use pre-installed drivers or install drivers yourself. For GPU node groups without drivers, cluster autoscaling is unavailable.

Create a cluster on a cloud server with a GPU

Use the Create a Managed Kubernetes cluster on a cloud server guide.

Select:

  • configuration — a fixed configuration of a node group with a GPU;
  • GPU drivers — the GPU Drivers toggle is enabled by default, and the cluster uses pre-installed drivers. To install GPU drivers yourself, disable the GPU Drivers toggle.

Available GPUs

MemoryCUDA coresTensor cores

NVIDIA® A100 40Gb

40 GB
HBM2

6192432
NVIDIA® A100 80Gb80 GB
HBM2
6912432
NVIDIA® Tesla T416 GB
GDDR6
2560320
NVIDIA® A3024 GB
HBM2
3804224
NVIDIA® A2
(an updated analog of
NVIDIA® Tesla T4)
16 GB
GDDR6
128040
NVIDIA® GTX 10808 GB
GDDR5X
2560
NVIDIA® RTX 2080 Ti11 GB
GDDR6
4352544
NVIDIA® RTX 4090 24 GB24 GB
GDDR6X
16384512
NVIDIA® RTX 4090 48 GB48 GB
GDDR6X
16384512
NVIDIA® RTX 6000 Ada
(L40 equivalent)
48 GB
GDDR6X
18176568
NVIDIA® A2000
(RTX 3060 equivalent)
6 GB
GDDR6
3328104
NVIDIA® A5000
(RTX 3080 equivalent)
24 GB
GDDR6
8192256
NVIDIA® H10080 GB
HBM3
16896528
NVIDIA® H200141 GB
HBM3e
16896528
NVIDIA® L424 GB
GDDR6
20480640
NVIDIA® RTX 6000 Pro48 GB
GDDR7
18432576

You can view the current list of GPUs in the control panel: in the top menu, click ProductsManaged KubernetesCreate cluster → the Node groups step → Cloud server Node configurationFixed with GPU.

You can check GPU availability in regions in the GPU for Managed Kubernetes availability matrix.

NVIDIA® A100 40 GB

Features maximum performance for AI, HPC, and data processing. Suitable for deep learning, scientific research, and data analytics.

Based on the Ampere® architecture, with a throughput of up to 1.5 GB/s. See detailed specifications of the NVIDIA® A100 40Gb in the NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 8 GPUs × 40 GB are available, with 6 to 48 vCPUs, and 87 to 704 GB RAM.

NVIDIA® A100 80 GB

Features maximum performance for AI, HPC, and data processing, as well as a large amount of memory for resource-intensive tasks. Suitable for deep learning, scientific research, and data analytics.

Based on the Ampere® architecture, with 80 GB of HBM2 memory and a throughput of up to 1.5 GB/s. See detailed specifications of the NVIDIA® A100 80Gb in the NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 8 GPUs × 80 GB are available, with 12 to 192 vCPUs, and 128 to 1 000 GB RAM.

NVIDIA® Tesla T4

Suitable for Machine Learning and Deep Learning, inference, graphics processing, and video rendering. It works with most AI frameworks and is compatible with all types of neural networks.

Based on the Turing® architecture, with a throughput of up to 300 GB/s. See detailed specifications of the NVIDIA® Tesla T4 in the NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 4 GPUs × 16 GB are available, with 4 to 24 vCPUs, and 32 to 320 GB RAM.

NVIDIA® A30

Suitable for AI inference, HPC, natural language processing, conversational AI, and recommendation systems.

Based on the Ampere® architecture, with a throughput of up to 933 GB/s. See detailed specifications of the NVIDIA® A30 in the NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 2 GPUs × 24 GB are available, with 16 to 48 vCPUs, and 64 to 320 GB RAM.

NVIDIA® A2

Entry-level GPU. Suitable for simple inference, video and graphics, Edge AI (edge computing), Edge video, mobile cloud gaming.

Based on the Ampere® architecture, with a throughput of up to 200 GB/s. See detailed specifications of the NVIDIA® A2 in the NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 4 GPUs × 16 GB are available, with 12 to 48 vCPUs, and 32 to 320 GB RAM.

NVIDIA® GTX 1080

A high-performance and energy-efficient GPU. The solution is implemented using FinFET technology and GDDR5X memory. Dynamic load balancing helps distribute tasks so resources do not idle. It features maximum performance for display, VR, ultra-high resolution parameters, and data processing.

Based on the Pascal® architecture, with a throughput of up to 320 GB/s. See detailed specifications of the NVIDIA® GTX 1080 in the NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 8 GPUs × 8 GB are available, with 8 to 28 vCPUs, and 24 to 96 GB RAM.

NVIDIA® RTX 2080 Ti

High-performance GPU for complex graphics tasks. Suitable for:

  • high-resolution video processing;
  • 3D modeling;
  • rendering and photo editing;
  • neural network training;
  • complex artificial intelligence computations;
  • large data volume processing.

Based on the Turing® architecture, with a throughput of up to 616 GB/s. See detailed specifications of the NVIDIA® RTX 2080 Ti in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 4 GPUs × 11 GB, with vCPUs from 2 to 48 and RAM from 32 to 320 GB.

NVIDIA® RTX 4090 24 GB

A high-performance GeForce series GPU. Suitable for professional design and 3D modeling, video production, rendering, ML tasks (training and model inference), working with large language models (LLMs), as well as scientific and engineering calculations (e.g., climate modeling or bioinformatics).

Based on the Ada Lovelace® architecture, with a throughput of up to 1008 GB/s. See detailed specifications of the NVIDIA® RTX 4090 in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 4 GPUs × 24 GB, with vCPUs from 4 to 64 and RAM from 16 to 356 GB.

NVIDIA® RTX 4090 48 GB

A high-performance GeForce series GPU with more memory than the NVIDIA® RTX 4090 24 Gb, suitable for:

  • professional design and 3D modeling;
  • video production and rendering;
  • ML tasks (training and model inference);
  • working with large language models (LLMs);
  • scientific and engineering calculations (e.g., climate modeling or bioinformatics).

Based on the Ada Lovelace® architecture, with 48 GB of GDDR6X memory and a throughput of up to 1008 GB/s. See detailed specifications of the NVIDIA® RTX 4090 in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 8 GPUs × 48 GB, with vCPUs from 12 to 192, RAM from 64 to 896 GB, and a local disk from 64 to 800 GB.

NVIDIA® RTX 6000 Ada

A professional GPU for computing and graphics power. Suitable for ML tasks, rendering, scientific computing, and high-performance visualization.

Based on the Ada Lovelace® architecture, with 48 GB of GDDR6X memory and a throughput of up to 960 GB/s. See detailed specifications of the NVIDIA® RTX 6000 Ada in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 4 GPUs × 48 GB, with vCPUs from 12 to 96 and RAM from 64 to 450 GB.

NVIDIA® A2000

An energy-efficient GPU for compact workstations. Suitable for AI, graphics, and video rendering.

Based on the Ampere® architecture, with a throughput of up to 288 GB/s. See detailed specifications of the NVIDIA® A2000 in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 4 GPUs × 6 GB, with vCPUs from 6 to 24 and RAM from 16 to 320 GB.

NVIDIA® A5000

A versatile GPU suitable for any tasks within its performance range.

Based on the Ampere® architecture, with a throughput of up to 768 GB/s. See detailed specifications of the NVIDIA® A5000 in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 2 GPUs × 24 GB, with vCPUs from 8 to 48 and RAM from 32 to 320 GB.

NVIDIA® H100

A powerful GPU suitable for AI, HPC, and scalable computing.

Based on the Hopper™ architecture, with 80 GB of HBM3 memory and a throughput of up to 3 TB/s. See detailed specifications of the NVIDIA® H100 in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 2 GPUs × 80 GB, with vCPUs from 12 to 48 and RAM from 128 to 256 GB.

NVIDIA® H200

A professional GPU:

  • for accelerating generative AI;
  • high-performance computing (HPC);
  • inference of large language models (LLMs);
  • fine-tuning models;
  • image and video generation.

Based on the Hopper™ architecture, with 141 GB of HBM3 memory and a throughput of up to 4.8 TB/s. See detailed specifications of the NVIDIA® H200 in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 8 GPUs × 141 GB, with vCPUs from 12 to 192 and RAM from 120 GB to 1 TB.

NVIDIA® L4

A versatile GPU for accelerating AI/ML workloads, video processing, streaming, and VDI. Suitable for running modern language models (LLMs) and multimodal models.

Based on the Ada Lovelace® architecture, with 24 GB of GDDR6 memory and a throughput of up to 3 TB/s. See detailed specifications of the NVIDIA® L4 in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 8 GPUs × 24 GB, with vCPUs from 8 to 128 and RAM from 32 GB to 512 GB.

NVIDIA® RTX 6000 Pro

A professional GPU:

  • for accelerating generative AI;
  • inference of language models (LLMs);
  • fine-tuning models;
  • image and video generation;
  • 3D rendering and video processing.

Based on the Blackwell® architecture, with 96 GB of GDDR7 memory and a throughput of up to 1.6 TB/s. See detailed specifications of the NVIDIA® RTX 6000 Pro in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 8 GPUs × 96 GB, with vCPUs from 16 to 256 and RAM from 120 GB to 1 TB.