General information about the Foundation Models Catalog product
The product is at the stage of limited testing (private preview).
Foundation Models Catalog is a catalog of preconfigured models with a ready-made API.Models are deployed as separate inference services — isolated services with dedicated resources (GPU, vCPU, RAM, disk).
A dedicated scalable endpoint is used to work with the selected model.The endpoints of the created inference services are compatible with OpenAI API.
Principle of operation
Foundation Models Catalog utilizes cloud platform resources.Each model is deployed on dedicated resources as a separate inference service.The inference service is built on a Managed Kubernetes cluster with GPUs.
The cluster configuration for the inference service is automatically selected based on the parameters you choose for the model.Depending on the selected configuration, the expected model performance metrics are displayed:
- number of tokens generated per second;
- average time from receiving a request to generating the first token;
- average query execution time;
- average number of simultaneously processed requests per second.
Inference services only work in synchronous mode — the response from the model is returned piecemeal as it is generated, as in chatbots.
Inference services are automatically scaled — depending on the load, the number of nodes will automatically decrease or increase.The average time a request is in the queue is used as a metric for autoscaling.Autoscaling is limited by the limits of the Managed Kubernetes cluster.
Requests to the inference service are sent by access token.For each inference service the tokens are individual.You can manage the tokens.
Manage tokens
You can manage inference service access tokens through the control panel.The following operations are available:
- to create an access token;
- remove the access token.
Create an application for testing
- In the Control Panel, on the top menu, click Products and select Foundation Models Catalog.
- In the model card, click Create.
- In the Application Description field, enter:
- contacts of the technical specialist — they are needed to clarify the technical details of testing;
- optional: inference service requirements, e.g. GPU for model testing and expected model performance metrics.
- Click Create Request.A ticket with a request for Foundation Models Catalog testing will be automatically generated.
- Wait for a Selectel employee to respond to the ticket.
Cost
During the limited testing period (private preview), only cloud platform resources are paid for using the cloud platform payment model.
Top up your balance before creating a test request.
Prices for resources can be viewed at selectel.ru.