General information about the ML Platform product
The Selectel ML Platform is a prepared infrastructure for implementing ML development processes: training, deploying ML models, and more. The infrastructure consists of software and hardware components that are configured and ready for use.
When selecting ML Platform components, all available cloud server configurations are used. After connecting the platform, its composition can be expanded with your own software components. The use of the following has been tested:
- ClearML;
- Kubeflow — for more details on installing Kubeflow, see the Install Kubeflow guide.
There are no additional restrictions on managing the ML Platform cluster from Selectel.
Platform components
By default, the ML Platform consists of:
- hardware components:
- cloud platform — a base for Managed Kubernetes with NVIDIA® GPUs (Tesla T4, A2, A30, A100, A2000, A5000, GTX 1080, RTX 2080 Ti);
- software components:
- Managed Kubernetes clusters with preconfiguration;
- a domain for accessing the Managed Kubernetes cluster;
- SSO Keycloak — authorization in internal platform services;
- Prom Stack — monitoring of platform components;
- Forecastle — the platform homepage;
- S3 — storage for datasets and experiment data;
- Container Registry — container image storage.
In Managed Kubernetes clusters:
- drivers are installed;
- nodes are annotated;
- necessary GPU resources for computing are added;
- the network is configured, including Traefik Kubernetes Ingress.
When installing ClearML in a platform cluster, it is managed directly via an SDK installed in the user's own IDE. To run ML experiments, ClearML uses cluster nodes. The ClearML architecture allows for different component configurations:
- a single Managed Kubernetes cluster for all ML tasks;
- several Managed Kubernetes clusters — each for its own task (Inference and Training);
- connecting a dedicated server as a computational node for ML experiments.
Connect platform
- In the control panel, from the top menu, click Products and select ML Platform.
- Click Create a test request.
- Select the data type.
- Specify the data volume in GB or MB.
- Optional: to help us recommend suitable connection methods for the ML Platform, enter your data source. For example: Selectel, on-premise, or other cloud providers.
- Optional: for us to consider your specific data security requirements during the test, check the box There are additional data security requirements for the test. Describe the requirements in the Request comments field.
- Specify the model size in GB or MB.
- Specify the number of people who will use the platform simultaneously.
- Select your desired GPU model or check the box No GPU model requirements. GPU specifications can be viewed in the Available GPUs subsection of the Create a cloud server with GPU instruction.
- Enter the contact information for a technical specialist. This is required to clarify technical details regarding the testing.
- Optional: enter comments for the request. For example, specify desired tools, components, or data security requirements for the test.
- Click Send request. A ticket with the ML Platform test request will be created automatically.
- Wait for a response from a Selectel employee in the ticket. They will contact you to clarify the details of creating the ML Platform.
Cost
The cost of the ML Platform is calculated after the request is processed and a configuration selected. It is determined solely by the cost of the platform components: the Managed Kubernetes cluster, S3, and Container Registry.