Skip to main content

Product Description Foundation Models Catalog

Last update:
For your information

The product is in the public preview stage.

Foundation Models Catalog is a catalog of preconfigured ML models with a ready API. Models are deployed as separate inference services - isolated services with dedicated resources (GPU, vCPU, RAM, disk).

To work with the product, the access control model in Selectel products is supported: user types, roles, projects and project limits and quotas.

Tasks to be solved

  • working with models without the need to deploy infrastructure on your own;

  • Selecting an out-of-the-box infrastructure for expected or changing workloads. You can evaluate model performance metrics on different inference service configurations and select the right one or configure automatic scaling;

  • testing and selecting different models for your projects. You can deploy multiple models and compare which one performs better for your tasks;

  • Integration of models into your own projects via a dedicated endpoint.

Principle of operation

Foundation Models Catalog utilizes the resources of the cloud platform. Each model is deployed as a separate inference service on a Managed Kubernetes cluster with GPU. Model weights are stored in S3 Selectel.

For fault tolerance, Foundation Models Catalog inference services are deployed in clusters in different locations. All clusters with inference services are centrally managed by a separate cluster via Flux CD. Multiple inference services can run in the same cluster - each is isolated from the other and scales independently.

Each inference service uses a suitable inference server depending on the type of model. For example, for large language models (LLM) the vLLM inference server is used.

Requests to models are executed through a dedicated endpoint - a public API compatible with OpenAI API. Inference services work only in synchronous mode - the response from the model is returned piecemeal as it is generated, as in chatbots.

How to work with Foundation Models Catalog

You can work with Foundation Models Catalog in the control panel or through the API. To get started with Foundation Models Catalog, use the Foundation Models Catalog: Quick Start instructions.

When creating an inference service, you can select its configuration. The list of available configurations depends on the selected model and its parameters. When selecting a configuration, you can see the expected performance metrics of the model.

After the inference service is created, a dedicated endpoint for working with the model will be automatically generated. The endpoint will be available in the control panel on the inference service page. To interact with the inference service you can use curl queries and tools such as Postman, SoapUI, Open WebUI and others. You can integrate the inference service into your own projects via a dedicated endpoint.

The inference service is accessed via API keys. API keys are individual for each inference service. You can manage the API keys.

You can scale the inference service depending on the number of model queries. To scale the inference service, change the number of inference instances - the actual deployed model instances. For more information, see the Scale inference service instruction.

Available models

You can view the current list of models in the control panel: from the top menu, click ProductsFoundation Models Catalog.

Areas of responsibility

Selectel provides

  • infrastructure for creating inference services;

  • access to models via a public API compatible with the OpenAI API;

  • scalability of inference services;

  • inference services monitoring system in the control panel;

  • data storage security in accordance with the requirements of 152-FZ;

  • integration with other Selectel services;

  • technical support.

Selectel is not responsible for

  • for integrating models into your projects;

  • the business logic of how models work in your projects.

Cost

Foundation Models Catalog is paid on a pay-as-you-go model. The balance is debited every hour for the previous hour of using the cloud platform resources. During the public preview, the number of tokens is not included in the price. For more information, please refer to the Foundation Models Catalog.

What is included in the price

  • an unlimited number of tokens;

  • free domain name for public access to the model.

Limitations

Not supported in the Foundation Models Catalog:

  • working with models in asynchronous mode;

  • uploading your own models to the catalog;

  • deployment of models from the op-premise catalog, in A-DC, on dedicated servers and certified cloud.