Graphics Processing Units (GPU)

You can add a GPU (graphics processing unit) to a cloud server when creating a server or to an existing server.

GPUs are used as dedicated PCI devices inside a cloud server.

Graphics processing units are available in fixed and custom GPU line configurations.

GPU lineup configurations and custom configurations with GPUs can be used with a local or network boot volume.

For cloud servers with a local disk, you can use:

NVIDIA® A100 40Gb;
NVIDIA® A100 80Gb;
NVIDIA® A30;
NVIDIA® RTX 4090 48 GB;
NVIDIA® A5000;
NVIDIA® RTX 6000 Ada;
NVIDIA® RTX 6000 Pro;
NVIDIA® H200.

If you need a server with a pre-configured set of machine learning and data analysis tools and libraries, use the AI Marketplace.

Available GPUs

	Memory	CUDA Cores	Tensor Cores
NVIDIA® A100 40 GB NVIDIA® A100 40Gb NVLink (upon request)	40 GB HBM2	6192	432
NVIDIA® A100 80 GB	80 GB HBM2	6912	432
NVIDIA® Tesla T4	16 GB GDDR6	2560	320
NVIDIA® A30	24 GB HBM2	3804	224
NVIDIA® A2 (updated analog of NVIDIA® Tesla T4	16 GB GDDR6	1280	40
NVIDIA® GTX 1080	8 GB GDDR5X	2560	✗
NVIDIA® RTX 2080 Ti	11 GB GDDR6	4352	544
NVIDIA® RTX 4090 24 GB	24 GB GDDR6X	16384	512
NVIDIA® RTX 4090 48 GB	48 GB GDDR6X	16384	512
NVIDIA® RTX 6000 Ada (analog of L40)	48 GB GDDR6X	18176	568
NVIDIA® A2000 (analog of RTX 3060)	6 GB GDDR6	3328	104
NVIDIA® A5000 (analog of RTX 3080)	24 GB GDDR6	8192	256
NVIDIA® H100	80 GB HBM3	16896	528
NVIDIA® H200	141 GB HBM3e	16896	528
NVIDIA® L4	24 GB GDDR6	20480	640
NVIDIA® RTX 6000 Pro	48 GB GDDR7	18432	576

You can view the current list of GPUs in the control panel: from the top menu, click Products → Cloud Servers → click Create Server.

You can check GPU availability in regions in the GPU for Cloud Servers availability matrix.

NVIDIA® A100 40 GB

Features maximum performance for AI, HPC, and data processing. Suitable for deep learning, scientific research, and data analytics.

Based on the Ampere® architecture, with 40 GB of HBM2 memory and a bandwidth of up to 1.5 GB/s. See detailed specifications NVIDIA® A100 40Gb in the NVIDIA® documentation.

In fixed GPU lineup configurations, 1 to 8 GPU × 40 GB are available, with 6 to 48 vCPUs and 87 to 700 GB of RAM.

In custom configurations, 1 to 8 GPU × 40 GB are available, with 2 to 32 vCPUs and 512 MB to 256 GB of RAM.

NVIDIA® A100 40 GB NVLink

You can combine two NVIDIA® A100 40Gb GPUs using NVLink technology.

NVLink increases data transfer speed when connecting GPUs compared to the PCIe interface. GPUs connected via NVLink allow for the use of more memory and increase server performance for complex calculations, such as training large language ML models.

NVLink works with NVIDIA® A100 40Gb — GPUs based on the Ampere® architecture, with 40 GB of HBM2 memory and a bandwidth of up to 1.5 GB/s. See detailed specifications NVIDIA® A100 40Gb and a description of the NVLink technology in the NVIDIA® documentation.

NVIDIA® A100 40Gb NVLink are available upon request — create a ticket.

NVIDIA® A100 80 GB

Features maximum performance for AI, HPC, and data processing, as well as a large amount of memory for resource-intensive tasks. Suitable for deep learning, scientific research, and data analytics.

Based on the Ampere® architecture, with 80 GB of HBM2 memory and a bandwidth of up to 1.5 GB/s. See detailed specifications NVIDIA® A100 80Gb in the NVIDIA® documentation.

In fixed GPU lineup configurations, 1 to 8 GPU × 80 GB are available, with 12 to 96 vCPUs, 128 to 1000 GB of RAM, and 128 GB to 6.88 TB of local disk.

In custom configurations, 1 to 8 GPU × 80 GB are available, with 12 to 192 vCPUs, 64 GB to 1000 GB of RAM, and 256 GB to 3.36 TB of local disk.

NVIDIA® Tesla T4

Suitable for Machine Learning and Deep Learning, inference, graphics processing, and video rendering. Works with most AI frameworks and is compatible with all types of neural networks.

Based on the Turing® architecture, with 16 GB of GDDR6 memory and a bandwidth of up to 300 GB/s. See detailed specifications NVIDIA® Tesla T4 in the NVIDIA® documentation.

In fixed GPU lineup configurations, 1 to 4 GPU × 16 GB are available, with 4 to 24 vCPUs and 32 to 320 GB of RAM.

In custom configurations, 1 to 4 GPU × 16 GB are available, with 2 to 32 vCPUs and 512 MB to 256 GB of RAM.

NVIDIA® A30

Suitable for AI inference, HPC, language processing, conversational AI, and recommendation systems.

Based on the Ampere® architecture, with 24 GB of HBM2 memory and a bandwidth of up to 933 GB/s. See detailed specifications NVIDIA® A30 in the NVIDIA® documentation.

In fixed GPU lineup configurations, 1 to 2 GPU × 24 GB are available, with 16 to 48 vCPUs and 64 to 320 GB of RAM.

In custom configurations, 1 to 2 GPU × 24 GB are available, with 2 to 32 vCPUs and 512 MB to 256 GB of RAM.

NVIDIA® A2

Entry-level GPU. Suitable for simple inference, video and graphics, Edge AI (edge computing), Edge video, and mobile cloud gaming.

Based on the Ampere® architecture, with 16 GB of GDDR6 memory and a bandwidth of up to 200 GB/s. See detailed specifications NVIDIA® A2 in the NVIDIA® documentation.

In fixed GPU lineup configurations, 1 to 4 GPU × 16 GB are available, with 12 to 48 vCPUs and 32 to 320 GB of RAM.

In custom configurations, 1 to 4 GPU × 16 GB are available, with 2 to 32 vCPUs and 512 MB to 256 GB of RAM.

NVIDIA® GTX 1080

A high-performance and energy-efficient GPU. The solution is implemented using FinFET technology and GDDR5X memory. Dynamic load balancing helps distribute tasks to prevent resources from idling. Features maximum performance for display purposes, VR, ultra-high resolution settings, and data processing.

Based on the Pascal® architecture, with 8 GB of GDDR5X memory and a bandwidth of up to 320 GB/s. See detailed specifications NVIDIA® GTX 1080 in the NVIDIA® documentation.

In fixed GPU lineup configurations, 1 to 8 GPU × 8 GB are available, with 8 to 28 vCPUs and 24 to 96 GB of RAM.

In custom configurations, 1 to 8 GPU × 8 GB are available, with 2 to 32 vCPUs and 512 MB to 256 GB of RAM.

NVIDIA® RTX 2080 Ti

A high-performance GPU for complex graphics tasks. Suitable for:

high-resolution video processing;
3D model creation;
rendering and photo processing;
neural network training;
complex artificial intelligence calculations;
large-scale data processing.

Based on the Turing® architecture, with 11 GB of GDDR6 memory and a bandwidth of up to 616 GB/s. See detailed specifications NVIDIA® RTX 2080 Ti in the NVIDIA® documentation.

Fixed GPU lineup configurations offer 1 to 4 GPUs × 11 GB, with 2 to 48 vCPUs and 32 to 320 GB of RAM.

Custom configurations offer 1 to 4 GPUs × 11 GB, with 2 to 32 vCPUs and 512 MB to 256 GB of RAM.

NVIDIA® RTX 4090 24 Gb

A high-performance GeForce series GPU suitable for:

professional design and 3D modeling;
video editing and rendering;
ML tasks (model training and inference);
working with large language models (LLMs);
scientific and engineering computing (for example, in climate modeling or bioinformatics). Based on the Ada Lovelace® architecture, with 24 GB of GDDR6X memory and a bandwidth of up to 1008 GB/s. See detailed specifications NVIDIA® RTX 4090 in the NVIDIA® documentation.

Fixed GPU lineup configurations offer 1 to 4 GPUs × 24 GB, with 4 to 64 vCPUs and 16 to 356 GB of RAM.

Custom configurations offer 1 to 4 GPUs × 24 GB, with 2 to 32 vCPUs and 4 to 256 GB of RAM.

NVIDIA® RTX 4090 48 Gb

A high-performance GeForce series GPU with more memory than the NVIDIA® RTX 4090 24 Gb, suitable for:

professional design and 3D modeling;
video editing and rendering;
ML tasks (model training and inference);
working with large language models (LLMs);
scientific and engineering calculations (e.g., climate modeling or bioinformatics).

Based on the Ada Lovelace® architecture, with 48 GB of GDDR6X memory and a bandwidth of up to 1008 GB/s. See detailed specifications NVIDIA® RTX 4090 in the NVIDIA® documentation.

Fixed GPU lineup configurations offer 1 to 8 GPUs × 48 GB, with 12 to 192 vCPUs, 64 to 896 GB of RAM, and a local disk from 64 to 800 GB.

Custom configurations offer 1 to 8 GPUs × 48 GB, with 12 to 192 vCPUs, 64 to 896 GB of RAM, and a local disk from 50 to 800 GB.

NVIDIA® RTX 6000 Ada

Professional GPU for computing and graphics power. Suitable for ML tasks, rendering, scientific computing, and high-performance visualization.

Based on the Ada Lovelace® architecture, with 48 GB of GDDR6X memory and a bandwidth of up to 960 GB/s. See detailed specifications NVIDIA® RTX 6000 Ada in the NVIDIA® documentation.

Fixed GPU lineup configurations offer 1 to 4 GPUs × 48 GB, with 12 to 96 vCPUs, 64 to 450 GB of RAM, and a local disk from 64 GB to 2 TB.

Custom configurations offer 1 to 4 GPUs × 48 GB, with 12 to 96 vCPUs, 64 to 450 GB of RAM, and a local disk from 64 GB to 3.52 TB.

NVIDIA® A2000

Energy-efficient GPU for compact workstations. Suitable for AI, graphics, and video rendering.

Based on the Ampere® architecture, with 6 GB of GDDR6 memory and a bandwidth of up to 288 GB/s. See detailed specifications NVIDIA® A2000 in the NVIDIA® documentation.

Fixed GPU lineup configurations offer 1 to 4 GPUs × 6 GB, with 6 to 24 vCPUs and 16 to 320 GB of RAM.

Custom configurations offer 1 to 4 GPUs × 6 GB, with 2 to 32 vCPUs and 512 MB to 256 GB of RAM.

NVIDIA® A5000

A versatile GPU suitable for any tasks within its performance range.

Based on the Ampere® architecture, with 24 GB of GDDR6 memory and a bandwidth of up to 768 GB/s. See detailed specifications NVIDIA® A5000 in the NVIDIA® documentation.

Fixed configurations in the GPU line offer from 1 to 8 GPUs × 24 GB, with vCPU from 8 to 128, RAM from 16 to 700 GB, and a local volume from 64 GB to 1 TB.

Custom configurations offer from 1 to 8 GPUs × 24 GB, with vCPU from 2 to 128, RAM from 2 to 731 GB, and a local volume from 20 GB to 2 TB.

NVIDIA® H100

A powerful GPU suitable for AI, HPC, and scalable computing.

Based on the Hopper™ architecture, with 80 GB of HBM3 memory and a bandwidth of up to 3 TB/s. See detailed specifications NVIDIA® H100 in the NVIDIA® documentation.

Fixed GPU lineup configurations offer 1 to 2 GPUs × 80 GB, with 12 to 48 vCPUs and 128 to 256 GB of RAM.

Custom configurations offer 1 to 2 GPUs × 80 GB, with 2 to 48 vCPUs and 2 GB to 256 GB of RAM.

NVIDIA® H200

A professional GPU suitable for:

accelerating generative AI;
high-performance computing (HPC);
inference of large language models (LLMs);
model fine-tuning;
image and video generation.

Based on the Hopper™ architecture, with 141 GB of HBM3 memory and a bandwidth of up to 4.8 TB/s. See detailed specifications NVIDIA® H200 in the NVIDIA® documentation.

Fixed and custom GPU lineup configurations offer 1 to 8 GPUs × 141 GB, with 12 to 192 vCPUs, 120 GB to 1 TB of RAM, and a local disk from 256 GB to 3 TB.

NVIDIA® L4

Universal GPU for accelerating AI/ML workloads, video processing, streaming, and VDI. Suitable for running modern language models (LLM) and multimodal models.

Based on the Ada Lovelace® architecture, with 24 GB of GDDR6 memory and a bandwidth of up to 3 TB/s. See detailed specifications NVIDIA® L4 in the NVIDIA® documentation.

Fixed GPU lineup configurations offer 1 to 8 GPUs × 24 GB, with 8 to 128 vCPUs and 32 to 512 GB of RAM.

Custom configurations offer 1 to 8 GPUs × 24 GB, with 8 to 256 vCPUs and 64 to 640 GB of RAM.

NVIDIA® RTX 6000 Pro

A professional GPU for:

accelerating generative AI;
inference of language models (LLMs);
model fine-tuning;
image and video generation;
3D rendering and video processing.

Based on the Blackwell® architecture, with 96 GB of GDDR7 memory and a bandwidth of up to 1.6 TB/s. See detailed specifications NVIDIA® RTX 6000 Pro in the NVIDIA® documentation.

Fixed and custom GPU lineup configurations offer 1 to 8 GPUs × 96 GB, with 16 to 256 vCPUs, 120 GB to 1 TB of RAM, and a local disk from 256 GB to 3 TB.

Create a cloud server with GPU

Use the Create a cloud server instruction.

When creating a server, select:

source — GPU-optimized, off-the-shelf images marked in the version list as GPU optimized. These images include the drivers required for GPU operation. If you choose another source, you must install drivers on the server manually for stable NVIDIA® GPU operation;
configuration — a fixed or custom GPU lineup configuration from 2 vCPUs.