Create a Managed Kubernetes cluster on a cloud server with a GPU
You can add GPUs (graphics processing units) to a Managed Kubernetes cluster on a cloud server when creating a Managed Kubernetes cluster on a cloud server or by adding a node group on a cloud server.
GPU availability in regions can be viewed in the GPU for Managed Kubernetes availability matrix.
On nodes with a GPU, you can use pre-installed drivers or install drivers yourself. For GPU node groups without drivers, cluster autoscaling is unavailable.
Create a cluster on a cloud server with a GPU
Use the Create a Managed Kubernetes cluster on a cloud server guide.
Select:
- configuration — a fixed configuration of a node group with a GPU;
- GPU drivers — the GPU Drivers toggle is enabled by default, and the cluster uses pre-installed drivers. To install GPU drivers yourself, disable the GPU Drivers toggle.
Available GPUs
You can view the current list of GPUs in the control panel: in the top menu, click Products → Managed Kubernetes → Create cluster → the Node groups step → Cloud server Node configuration → Fixed with GPU.
You can check GPU availability in regions in the GPU for Managed Kubernetes availability matrix.
NVIDIA® A100 40 GB
Features maximum performance for AI, HPC, and data processing. Suitable for deep learning, scientific research, and data analytics.
Based on the Ampere® architecture, with a throughput of up to 1.5 GB/s. See detailed specifications of the NVIDIA® A100 40Gb in the NVIDIA® documentation.
In fixed Managed Kubernetes cluster configurations, 1 to 8 GPUs × 40 GB are available, with 6 to 48 vCPUs, and 87 to 704 GB RAM.
NVIDIA® A100 80 GB
Features maximum performance for AI, HPC, and data processing, as well as a large amount of memory for resource-intensive tasks. Suitable for deep learning, scientific research, and data analytics.
Based on the Ampere® architecture, with 80 GB of HBM2 memory and a throughput of up to 1.5 GB/s. See detailed specifications of the NVIDIA® A100 80Gb in the NVIDIA® documentation.
In fixed Managed Kubernetes cluster configurations, 1 to 8 GPUs × 80 GB are available, with 12 to 192 vCPUs, and 128 to 1 000 GB RAM.
NVIDIA® Tesla T4
Suitable for Machine Learning and Deep Learning, inference, graphics processing, and video rendering. It works with most AI frameworks and is compatible with all types of neural networks.
Based on the Turing® architecture, with a throughput of up to 300 GB/s. See detailed specifications of the NVIDIA® Tesla T4 in the NVIDIA® documentation.
In fixed Managed Kubernetes cluster configurations, 1 to 4 GPUs × 16 GB are available, with 4 to 24 vCPUs, and 32 to 320 GB RAM.
NVIDIA® A30
Suitable for AI inference, HPC, natural language processing, conversational AI, and recommendation systems.
Based on the Ampere® architecture, with a throughput of up to 933 GB/s. See detailed specifications of the NVIDIA® A30 in the NVIDIA® documentation.
In fixed Managed Kubernetes cluster configurations, 1 to 2 GPUs × 24 GB are available, with 16 to 48 vCPUs, and 64 to 320 GB RAM.
NVIDIA® A2
Entry-level GPU. Suitable for simple inference, video and graphics, Edge AI (edge computing), Edge video, mobile cloud gaming.
Based on the Ampere® architecture, with a throughput of up to 200 GB/s. See detailed specifications of the NVIDIA® A2 in the NVIDIA® documentation.
In fixed Managed Kubernetes cluster configurations, 1 to 4 GPUs × 16 GB are available, with 12 to 48 vCPUs, and 32 to 320 GB RAM.
NVIDIA® GTX 1080
A high-performance and energy-efficient GPU. The solution is implemented using FinFET technology and GDDR5X memory. Dynamic load balancing helps distribute tasks so resources do not idle. It features maximum performance for display, VR, ultra-high resolution parameters, and data processing.
Based on the Pascal® architecture, with a throughput of up to 320 GB/s. See detailed specifications of the NVIDIA® GTX 1080 in the NVIDIA® documentation.
In fixed Managed Kubernetes cluster configurations, 1 to 8 GPUs × 8 GB are available, with 8 to 28 vCPUs, and 24 to 96 GB RAM.
NVIDIA® RTX 2080 Ti
High-performance GPU for complex graphics tasks. Suitable for:
- high-resolution video processing;
- 3D modeling;
- rendering and photo editing;
- neural network training;
- complex artificial intelligence computations;
- large data volume processing.
Based on the Turing® architecture, with a throughput of up to 616 GB/s. See detailed specifications of the NVIDIA® RTX 2080 Ti in the NVIDIA® documentation.
Fixed Managed Kubernetes cluster configurations offer from 1 to 4 GPUs × 11 GB, with vCPUs from 2 to 48 and RAM from 32 to 320 GB.
NVIDIA® RTX 4090 24 GB
A high-performance GeForce series GPU. Suitable for professional design and 3D modeling, video production, rendering, ML tasks (training and model inference), working with large language models (LLMs), as well as scientific and engineering calculations (e.g., climate modeling or bioinformatics).
Based on the Ada Lovelace® architecture, with a throughput of up to 1008 GB/s. See detailed specifications of the NVIDIA® RTX 4090 in the NVIDIA® documentation.
Fixed Managed Kubernetes cluster configurations offer from 1 to 4 GPUs × 24 GB, with vCPUs from 4 to 64 and RAM from 16 to 356 GB.
NVIDIA® RTX 4090 48 GB
A high-performance GeForce series GPU with more memory than the NVIDIA® RTX 4090 24 Gb, suitable for:
- professional design and 3D modeling;
- video production and rendering;
- ML tasks (training and model inference);
- working with large language models (LLMs);
- scientific and engineering calculations (e.g., climate modeling or bioinformatics).
Based on the Ada Lovelace® architecture, with 48 GB of GDDR6X memory and a throughput of up to 1008 GB/s. See detailed specifications of the NVIDIA® RTX 4090 in the NVIDIA® documentation.
Fixed Managed Kubernetes cluster configurations offer from 1 to 8 GPUs × 48 GB, with vCPUs from 12 to 192, RAM from 64 to 896 GB, and a local disk from 64 to 800 GB.
NVIDIA® RTX 6000 Ada
A professional GPU for computing and graphics power. Suitable for ML tasks, rendering, scientific computing, and high-performance visualization.
Based on the Ada Lovelace® architecture, with 48 GB of GDDR6X memory and a throughput of up to 960 GB/s. See detailed specifications of the NVIDIA® RTX 6000 Ada in the NVIDIA® documentation.
Fixed Managed Kubernetes cluster configurations offer from 1 to 4 GPUs × 48 GB, with vCPUs from 12 to 96 and RAM from 64 to 450 GB.
NVIDIA® A2000
An energy-efficient GPU for compact workstations. Suitable for AI, graphics, and video rendering.
Based on the Ampere® architecture, with a throughput of up to 288 GB/s. See detailed specifications of the NVIDIA® A2000 in the NVIDIA® documentation.
Fixed Managed Kubernetes cluster configurations offer from 1 to 4 GPUs × 6 GB, with vCPUs from 6 to 24 and RAM from 16 to 320 GB.
NVIDIA® A5000
A versatile GPU suitable for any tasks within its performance range.
Based on the Ampere® architecture, with a throughput of up to 768 GB/s. See detailed specifications of the NVIDIA® A5000 in the NVIDIA® documentation.
Fixed Managed Kubernetes cluster configurations offer from 1 to 2 GPUs × 24 GB, with vCPUs from 8 to 48 and RAM from 32 to 320 GB.
NVIDIA® H100
A powerful GPU suitable for AI, HPC, and scalable computing.
Based on the Hopper™ architecture, with 80 GB of HBM3 memory and a throughput of up to 3 TB/s. See detailed specifications of the NVIDIA® H100 in the NVIDIA® documentation.
Fixed Managed Kubernetes cluster configurations offer from 1 to 2 GPUs × 80 GB, with vCPUs from 12 to 48 and RAM from 128 to 256 GB.
NVIDIA® H200
A professional GPU:
- for accelerating generative AI;
- high-performance computing (HPC);
- inference of large language models (LLMs);
- fine-tuning models;
- image and video generation.
Based on the Hopper™ architecture, with 141 GB of HBM3 memory and a throughput of up to 4.8 TB/s. See detailed specifications of the NVIDIA® H200 in the NVIDIA® documentation.
Fixed Managed Kubernetes cluster configurations offer from 1 to 8 GPUs × 141 GB, with vCPUs from 12 to 192 and RAM from 120 GB to 1 TB.
NVIDIA® L4
A versatile GPU for accelerating AI/ML workloads, video processing, streaming, and VDI. Suitable for running modern language models (LLMs) and multimodal models.
Based on the Ada Lovelace® architecture, with 24 GB of GDDR6 memory and a throughput of up to 3 TB/s. See detailed specifications of the NVIDIA® L4 in the NVIDIA® documentation.
Fixed Managed Kubernetes cluster configurations offer from 1 to 8 GPUs × 24 GB, with vCPUs from 8 to 128 and RAM from 32 GB to 512 GB.
NVIDIA® RTX 6000 Pro
A professional GPU:
- for accelerating generative AI;
- inference of language models (LLMs);
- fine-tuning models;
- image and video generation;
- 3D rendering and video processing.
Based on the Blackwell® architecture, with 96 GB of GDDR7 memory and a throughput of up to 1.6 TB/s. See detailed specifications of the NVIDIA® RTX 6000 Pro in the NVIDIA® documentation.
Fixed Managed Kubernetes cluster configurations offer from 1 to 8 GPUs × 96 GB, with vCPUs from 16 to 256 and RAM from 120 GB to 1 TB.