Create inference service
1. Select a model
-
In the Control Panel, on the top menu, click Products and select Foundation Models Catalog.
-
In the model card, click Create.
-
Enter the name of the inference service.
-
To filter the inference services in the list, add tags. A tag with the model name is automatically added. To add a new tag, in the Tags field, type a tag and press Enter.
-
Optional: enter a description of the inference service. For example, specify its purpose.
-
Click Continue.
2. Set up the infrastructure
-
Set the parameters of the model.
1.1 Select the data type of the model parameters.
1.2 Select the data type for the KV cache.
1.3 Select the maximum length of the context.
-
Select the inference service configuration. When selecting, consider the expected performance metrics of the model.
Once an inference service has been created, the configuration cannot be changed.
-
Click Continue.
3. Configure the inference service
-
Configure the number of inference instances.
1.1 To have a fixed number of instances in the service, open the Fixed tab and specify the number of instances.
1.2 To use autoscaling in the service, open the With autoscaling tab and set the minimum and maximum number of instances. The number of instances will change automatically only within the specified range depending on the load on the inference service.
You can change the number of inference instances after creating an inference service. For more information, see the Scaling an inference service instruction.
-
Select the type of disk for the inference-instance.
-
Click Continue.
4. Confirm configuration
-
Check the final configuration of the inference service.
-
Check the price of inference service.
-
Click Create Inference Service. Creating an inference service can take about 15 minutes.