Foundation Models Catalog: Quick Start

Create an inference service.
Connect to the inference service.

1. Create inference service

Select a model.
Set up the infrastructure.
Set up an inference service.
Confirm the configuration.

1. Select a model

In the Control Panel, on the top menu, click Products and select Foundation Models Catalog.
In the model card, click Create.
Enter the name of the inference service.
To filter the inference services in the list, add tags.A tag with the model name is automatically added.To add a new tag, in the Tags field, type a tag and press Enter.
Optional: enter a description of the inference service.For example, specify its purpose.
Click Continue.

2. Set up the infrastructure

Set the parameters of the model.

1.1 Select the data type of the model parameters.

1.2 Select the data type for the KV cache.

1.3 Select the maximum length of the context.
Select the inference service configuration.When selecting, consider the expected model performance metrics.

Once an inference service has been created, the configuration cannot be changed.
Click Continue.

3. Configure the inference service

Configure the number of inference instances.

1.1 To have a fixed number of instances in the service, open the Fixed tab and specify the number of instances.

1.2 To use autoscaling in the service, open the With autoscaling tab and set the minimum and maximum number of instances.The number of instances will change automatically only in the specified range depending on the load on the inference service.

You can change the number of inference instances after you create an inference service.For more information, see the Scale inference service instructions.
Select the type of disk for the inference-instance.
Click Continue.

4. Confirm configuration

Check the final configuration of the inference service.
Check the price of inference service.
Click Create Inference Service.Creating an inference service can take about 15 minutes.

2. Connect to the inference service

To connect to the inference service, send a test request via the Completions API or Chat API.

Use the Completions API to generate text based on a single promt - without dialog or message history support.For example, to continue a phrase, generate text from a template, or perform a one-time generation.

Use the Chat API to have a chatbot-like conversation - role-based and message history-based.

Completions API
Chat API

Open the CLI.
Send a test curl request:

curl <endpoint>/v1/completions \
-H "Authorization: Bearer <api_key>" \
-H "Content-Type: application/json" \
-d '{
"model": "<model>",
"prompt": "<prompt>",
"temperature": 0,
"max_tokens": 7
}'

Specify:

<endpoint> - endpoint of the inference service.Endpoint can be copied into the control panel: in the top menu click Products → Inference-services → inference services page → tab Quick Start → in the block Endpoint click ;
<api_key> - API key. Can be copied in control panel: in the top menu click Products → Inference Services → inference-services page → tab API keys → in the API-key line click and then ;
<model> - the name of the model. You can look it up in control panel: in the top menu, click Products → Inference Services → inference-services page → tab Service → line Model;

<prompt> - promt, for example:

Объясни, что такое промт

You will receive a response in OpenAI API format.