Create a Managed Kubernetes cluster on a cloud server with a GPU

You can add GPUs (graphics processing units) to a Managed Kubernetes cluster on a cloud server when creating a Managed Kubernetes cluster on a cloud server or by adding a node group on a cloud server.

GPU availability in regions can be viewed in the GPU for Managed Kubernetes availability matrix.

On nodes with a GPU, you can use pre-installed drivers or install drivers yourself. For GPU node groups without drivers, cluster autoscaling is unavailable.

Create a cluster on a cloud server with a GPU

Use the Create a Managed Kubernetes cluster on a cloud server guide.

Select:

configuration — a fixed configuration of a node group with a GPU;
GPU drivers — the GPU Drivers toggle is enabled by default, and the cluster uses pre-installed drivers. To install GPU drivers yourself, disable the GPU Drivers toggle.

Available GPUs

	Memory	CUDA cores	Tensor cores
NVIDIA® A100 40Gb	40 GB HBM2	6192	432
NVIDIA® A100 80Gb	80 GB HBM2	6912	432
NVIDIA® Tesla T4	16 GB GDDR6	2560	320
NVIDIA® A30	24 GB HBM2	3804	224
NVIDIA® A2 (an updated analog of NVIDIA® Tesla T4)	16 GB GDDR6	1280	40
NVIDIA® GTX 1080	8 GB GDDR5X	2560	✗
NVIDIA® RTX 2080 Ti	11 GB GDDR6	4352	544
NVIDIA® RTX 4090 24 GB	24 GB GDDR6X	16384	512
NVIDIA® RTX 4090 48 GB	48 GB GDDR6X	16384	512
NVIDIA® RTX 6000 Ada (L40 equivalent)	48 GB GDDR6X	18176	568
NVIDIA® A2000 (RTX 3060 equivalent)	6 GB GDDR6	3328	104
NVIDIA® A5000 (RTX 3080 equivalent)	24 GB GDDR6	8192	256
NVIDIA® H100	80 GB HBM3	16896	528
NVIDIA® H200	141 GB HBM3e	16896	528
NVIDIA® L4	24 GB GDDR6	20480	640
NVIDIA® RTX 6000 Pro	48 GB GDDR7	18432	576

You can view the current list of GPUs in the control panel: in the top menu, click Products → Managed Kubernetes → Create cluster → the Node groups step → Cloud server Node configuration → Fixed with GPU.

You can check GPU availability in regions in the GPU for Managed Kubernetes availability matrix.

NVIDIA® A100 40 GB

Features maximum performance for AI, HPC, and data processing. Suitable for deep learning, scientific research, and data analytics.

Based on the Ampere® architecture, with a throughput of up to 1.5 GB/s. See detailed specifications of the NVIDIA® A100 40Gb in the NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 8 GPUs × 40 GB are available, with 6 to 48 vCPUs, and 87 to 704 GB RAM.

NVIDIA® A100 80 GB

Features maximum performance for AI, HPC, and data processing, as well as a large amount of memory for resource-intensive tasks. Suitable for deep learning, scientific research, and data analytics.

Based on the Ampere® architecture, with 80 GB of HBM2 memory and a throughput of up to 1.5 GB/s. See detailed specifications of the NVIDIA® A100 80Gb in the NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 8 GPUs × 80 GB are available, with 12 to 192 vCPUs, and 128 to 1 000 GB RAM.

NVIDIA® Tesla T4

Suitable for Machine Learning and Deep Learning, inference, graphics processing, and video rendering. It works with most AI frameworks and is compatible with all types of neural networks.

Based on the Turing® architecture, with a throughput of up to 300 GB/s. See detailed specifications of the NVIDIA® Tesla T4 in the NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 4 GPUs × 16 GB are available, with 4 to 24 vCPUs, and 32 to 320 GB RAM.

NVIDIA® A30

Suitable for AI inference, HPC, natural language processing, conversational AI, and recommendation systems.

Based on the Ampere® architecture, with a throughput of up to 933 GB/s. See detailed specifications of the NVIDIA® A30 in the NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 2 GPUs × 24 GB are available, with 16 to 48 vCPUs, and 64 to 320 GB RAM.

NVIDIA® A2

Entry-level GPU. Suitable for simple inference, video and graphics, Edge AI (edge computing), Edge video, mobile cloud gaming.

Based on the Ampere® architecture, with a throughput of up to 200 GB/s. See detailed specifications of the NVIDIA® A2 in the NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 4 GPUs × 16 GB are available, with 12 to 48 vCPUs, and 32 to 320 GB RAM.

NVIDIA® GTX 1080

A high-performance and energy-efficient GPU. The solution is implemented using FinFET technology and GDDR5X memory. Dynamic load balancing helps distribute tasks so resources do not idle. It features maximum performance for display, VR, ultra-high resolution parameters, and data processing.

Based on the Pascal® architecture, with a throughput of up to 320 GB/s. See detailed specifications of the NVIDIA® GTX 1080 in the NVIDIA® documentation.

In fixed Managed Kubernetes cluster configurations, 1 to 8 GPUs × 8 GB are available, with 8 to 28 vCPUs, and 24 to 96 GB RAM.

NVIDIA® RTX 2080 Ti

High-performance GPU for complex graphics tasks. Suitable for:

high-resolution video processing;
3D modeling;
rendering and photo editing;
neural network training;
complex artificial intelligence computations;
large data volume processing.

Based on the Turing® architecture, with a throughput of up to 616 GB/s. See detailed specifications of the NVIDIA® RTX 2080 Ti in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 4 GPUs × 11 GB, with vCPUs from 2 to 48 and RAM from 32 to 320 GB.

NVIDIA® RTX 4090 24 GB

A high-performance GeForce series GPU. Suitable for professional design and 3D modeling, video production, rendering, ML tasks (training and model inference), working with large language models (LLMs), as well as scientific and engineering calculations (e.g., climate modeling or bioinformatics).

Based on the Ada Lovelace® architecture, with a throughput of up to 1008 GB/s. See detailed specifications of the NVIDIA® RTX 4090 in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 4 GPUs × 24 GB, with vCPUs from 4 to 64 and RAM from 16 to 356 GB.

NVIDIA® RTX 4090 48 GB

A high-performance GeForce series GPU with more memory than the NVIDIA® RTX 4090 24 Gb, suitable for:

professional design and 3D modeling;
video production and rendering;
ML tasks (training and model inference);
working with large language models (LLMs);
scientific and engineering calculations (e.g., climate modeling or bioinformatics).

Based on the Ada Lovelace® architecture, with 48 GB of GDDR6X memory and a throughput of up to 1008 GB/s. See detailed specifications of the NVIDIA® RTX 4090 in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 8 GPUs × 48 GB, with vCPUs from 12 to 192, RAM from 64 to 896 GB, and a local disk from 64 to 800 GB.

NVIDIA® RTX 6000 Ada

A professional GPU for computing and graphics power. Suitable for ML tasks, rendering, scientific computing, and high-performance visualization.

Based on the Ada Lovelace® architecture, with 48 GB of GDDR6X memory and a throughput of up to 960 GB/s. See detailed specifications of the NVIDIA® RTX 6000 Ada in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 4 GPUs × 48 GB, with vCPUs from 12 to 96 and RAM from 64 to 450 GB.

NVIDIA® A2000

An energy-efficient GPU for compact workstations. Suitable for AI, graphics, and video rendering.

Based on the Ampere® architecture, with a throughput of up to 288 GB/s. See detailed specifications of the NVIDIA® A2000 in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 4 GPUs × 6 GB, with vCPUs from 6 to 24 and RAM from 16 to 320 GB.

NVIDIA® A5000

A versatile GPU suitable for any tasks within its performance range.

Based on the Ampere® architecture, with a throughput of up to 768 GB/s. See detailed specifications of the NVIDIA® A5000 in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 2 GPUs × 24 GB, with vCPUs from 8 to 48 and RAM from 32 to 320 GB.

NVIDIA® H100

A powerful GPU suitable for AI, HPC, and scalable computing.

Based on the Hopper™ architecture, with 80 GB of HBM3 memory and a throughput of up to 3 TB/s. See detailed specifications of the NVIDIA® H100 in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 2 GPUs × 80 GB, with vCPUs from 12 to 48 and RAM from 128 to 256 GB.

NVIDIA® H200

A professional GPU:

for accelerating generative AI;
high-performance computing (HPC);
inference of large language models (LLMs);
fine-tuning models;
image and video generation.

Based on the Hopper™ architecture, with 141 GB of HBM3 memory and a throughput of up to 4.8 TB/s. See detailed specifications of the NVIDIA® H200 in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 8 GPUs × 141 GB, with vCPUs from 12 to 192 and RAM from 120 GB to 1 TB.

NVIDIA® L4

A versatile GPU for accelerating AI/ML workloads, video processing, streaming, and VDI. Suitable for running modern language models (LLMs) and multimodal models.

Based on the Ada Lovelace® architecture, with 24 GB of GDDR6 memory and a throughput of up to 3 TB/s. See detailed specifications of the NVIDIA® L4 in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 8 GPUs × 24 GB, with vCPUs from 8 to 128 and RAM from 32 GB to 512 GB.

NVIDIA® RTX 6000 Pro

A professional GPU:

for accelerating generative AI;
inference of language models (LLMs);
fine-tuning models;
image and video generation;
3D rendering and video processing.

Based on the Blackwell® architecture, with 96 GB of GDDR7 memory and a throughput of up to 1.6 TB/s. See detailed specifications of the NVIDIA® RTX 6000 Pro in the NVIDIA® documentation.

Fixed Managed Kubernetes cluster configurations offer from 1 to 8 GPUs × 96 GB, with vCPUs from 16 to 256 and RAM from 120 GB to 1 TB.