Foundation Models Catalog: Quick Start
1. Create inference service
1. Select a model
-
In the Control Panel, on the top menu, click Products and select Foundation Models Catalog.
-
In the model card, click Create.
-
Enter the name of the inference service.
-
To filter the inference services in the list, add tags. A tag with the model name is automatically added. To add a new tag, in the Tags field, type a tag and press Enter.
-
Optional: enter a description of the inference service. For example, specify its purpose.
-
Click Continue.
2. Set up the infrastructure
-
Set the parameters of the model.
1.1 Select the data type of the model parameters.
1.2 Select the data type for the KV cache.
1.3 Select the maximum length of the context.
-
Select the inference service configuration. When selecting, consider the expected performance metrics of the model.
Once an inference service has been created, the configuration cannot be changed.
-
Click Continue.
3. Configure the inference service
-
Configure the number of inference instances.
1.1 To have a fixed number of instances in the service, open the Fixed tab and specify the number of instances.
1.2 To use autoscaling in the service, open the With autoscaling tab and set the minimum and maximum number of instances. The number of instances will change automatically only within the specified range depending on the load on the inference service.
You can change the number of inference instances after creating an inference service. For more information, see the Scaling an inference service instruction.
-
Select the type of disk for the inference-instance.
-
Click Continue.
4. Confirm configuration
-
Check the final configuration of the inference service.
-
Check the price of inference service.
-
Click Create Inference Service. Creating an inference service can take about 15 minutes.
2. Connect to the inference service
Completions API
Chat API
Use the Completions API to generate text based on a single promt - without dialog or message history support. For example, to continue a phrase, generate text from a template, or perform a one-time generation.
-
Open the CLI.
-
Send a test curl request:
curl https://<inference_service_uuid>.wc.<pool>.inference.selcloud.ru/v1/completions \
-H "Authorization: Bearer <api_key>" \
-H "Content-Type: application/json" \
-d '{
"model": "<model>",
"prompt": "<prompt>",
"temperature": 0,
"max_tokens": 7
}'
Specify:
-
<inference_service_uuuid>- The UUID of the inference service. Can be copied into the control panel: in the top menu, click Products → Inference-services → in the menu inference service menu select Copy UUID; -
<pool>- pool where the inference service is created, e.g.ru-7. You can look in control panel: in the top menu, click Products → Inference Services → inference service card; -
<api_key>- API key. Can be copied into control panel: in the top menu click Products → Inference Services → inference-services page → tab API keys → in the API-key line click and then ; -
<model>- model name. You can look in control panel: in the top menu, click Products → Inference Services → inference-services page → tab Service → line Model; -
<prompt>- text query to the model (promt), e.g.:Объясни, что такое промт
You will receive a response in OpenAI API format.
Use the Chat API to have a chatbot-like conversation - role-based and message history-based.
-
Open the CLI.
-
Send a test curl request:
curl https://<inference_service_uuid>.wc.<pool>.inference.selcloud.ru/v1/chat/completions \
-H "Authorization: Bearer <api_key>" \
-H "Content-Type: application/json" \
-d '{
"model": "<model>",
"messages": [
{
"role": "<role_1>",
"content": "<prompt_1>"
},
{
"role": "<role_2>",
"content": "<prompt_2>"
}
]
}'
Specify:
-
<inference_service_uuuid>- The UUID of the inference service. Can be copied into the control panel: in the top menu, click Products → Inference-services → in the menu inference service menu select Copy UUID; -
<pool>- pool where the inference service is created, e.g.ru-7. You can look in control panel: in the top menu, click Products → Inference Services → inference service card; -
<api_key>- API key. Can be copied into control panel: in the top menu click Products → Inference Services → inference-services page → tab API keys → in the API-key line click and then ; -
<model>- model name. You can look in control panel: in the top menu, click Products → Inference Services → inference-services page → tab Service → line Model; -
<role_1>- the role of the message sender, e.g.developer. It is used to structure the dialog and preserve the context of messages; -
<prompt_1>- text request to the model (promt) for a given role, for example:Ты виртуальный ассистент-помощник -
<role_2>- the role of the message sender, e.g.user. Used to structure the dialog and preserve the context of messages; -
<prompt_2>- text request to the model (promt) for a given role, for example:Объясни, что такое промт
You will receive a response in OpenAI API format.