Product Description Foundation Models Catalog

For your information

The product is in the public preview stage.

Foundation Models Catalog is a catalog of preconfigured ML models with a ready-made API.Models are deployed as isolated inference services with dedicated resources (GPU, vCPU, RAM, disk).

To work with the product, the access control model in Selectel products is supported: user types, roles, projects and project limits and quotas.

Tasks to be solved

working with models without the need to deploy infrastructure on your own;
You can compare model performance metrics on different inference-service configurations to find the right one and configure autoscaling;
Testing and selecting different models for your projects.You can deploy several models and compare which one copes better with your tasks;
Integration of models into your own projects via a dedicated endpoint.

Principle of operation

Foundation Models Catalog utilizes cloud platform resources.Each model is deployed as an isolated inference service on a Managed Kubernetes cluster with GPUs.Model weights are stored in S3 Selectel.

For fault tolerance, Foundation Models Catalog inference services are deployed in clusters in different locations.All clusters with inference services are centrally managed by a separate cluster via Flux CD. Multiple inference services can run in the same cluster - each isolated from each other and scaled independently.

Each inference service uses a suitable inference server depending on the type of model.For example, for large language models (LLM), the vLLM inference server is used.

Requests to models are made through a dedicated endpoint, a public API compatible with the OpenAI API.Inference services work only in synchronous mode - the response from the model is returned piecemeal as it is generated, as in chatbots.

How to work with Foundation Models Catalog

You can work with Foundation Models Catalog in the Control Panel or through the API.To get started with Foundation Models Catalog, use the Foundation Models Catalog: Quick Start instructions.

When you create an inference service, you can select its configuration.The list of available configurations depends on the selected model and its parameters.When you select a configuration, you can view the expected performance metrics of the model.

After the inference service is created, a dedicated endpoint for working with the model will be automatically generated.The endpoint will be available in the control panel on the inference service page.To interact with the inference service, you can use curl queries and tools such as Postman, SoapUI, Open WebUI and others.You can integrate the inference service into your own projects via a dedicated endpoint.

The inference service is accessed via API keys.The API keys are individual for each inference service.You can manage the API keys.

You can scale the inference service based on the number of model queries.To scale the inference service, change the number of inference instances-the deployed model instances.For more information, see the Scale inference service instructions.

Available models

You can view the current list of models in the control panel: from the top menu, click Products → Foundation Models Catalog.

Areas of responsibility

Selectel provides

infrastructure for creating inference services;
access to models via a public API compatible with the OpenAI API;
scalability of inference services;
inference services monitoring system in the control panel;
data storage security in accordance with the requirements of 152-FZ;
integration with other Selectel services;
technical support.

Selectel is not responsible for

for integrating models into your projects;
the business logic of how models work in your projects.

Cost

Foundation Models Catalog is paid using the pay-as-you-go payment model. The balance is debited every hour for the previous hour of using the cloud platform resources.During the public preview, the number of tokens is not included in the price.For more information, please refer to the Foundation Models Catalog payment model and prices.

What is included in the price

an unlimited number of tokens;
free domain name for public access to the model.

Limitations

Not supported in the Foundation Models Catalog:

working with models in asynchronous mode;
uploading your own models to the catalog;
deploy models from the op-premise catalog, in A-DCs, on dedicated servers and in the certified cloud.