Configurations of inference services
When creating an inference service, you can select its configuration. The list of available configurations depends on the selected model and its parameters. The configurations automatically select the number and type of graphics processing units (GPUs), the number of vCPUs and RAM.
Each configuration card lists the expected performance metrics for the model. You can compare configurations to find the right one for your application.
Once an inference service has been created, the configuration cannot be changed.