Skip to main content

Scale inference service

Last update:

You can scale the inference service depending on the number of queries to the model:

  • configure a fixed number of inference-instances - actually deployed instances of models. For example, add inference-instances when the number of requests to the model increases or reduce them when the number of requests decreases;

  • or configure autoscaling. The number of inference instances will automatically change within the specified range depending on the number of requests and their processing time.

To scale the inference service, change the number of inference instances.

Change the number of inference instances

You can use a fixed number of inference instances or set up autoscaling.

  1. In the control panel, on the top menu, click Products and select Inference Services.

  2. Open the inference service page → Service tab.

  3. In the Service Auto Scaling block, click Edit.

  4. Open the Fixed tab and specify the number of inference instances.

  5. Click Save. It may take more than 10 minutes to change the zoom settings. The inference service will not be available during this time.