Inference service configurations

When creating an inference service, you can select its configuration. The list of available configurations depends on the selected model and its parameters. Configurations automatically include the appropriate number and type of graphics processing units (GPU), as well as the amount of vCPU and RAM.

Each configuration card displays the expected model performance metrics. You can compare configurations and choose the one that best suits your needs.

The configuration cannot be changed after the inference service is created.