Create a Managed Kubernetes cluster for Data Science
In a Managed Kubernetes cluster, you can run a container with pre-installed machine learning tools and run the Jupyter Notebook service in it.
Container can be used to train and infer models for application development and data manipulation.
List of tools
A list of packages in the container:
- PIP
- PyTorch
- TensorFlow
- Keras
- Anaconda
- Jupyter Notebook
- scikit-learn
- Numpy
- Scipy
- Pandas
- NLTK
- OpenCV
- Catboost
- XGBoost
- LightGBM
Create a cluster for Data Science
- In Control Panel, go to Cloud Platform → Kubernetes.
- Click Create Cluster.
- Select a node group configuration with parameters of at least 4 vCPUs, 8 GB RAM, 20 GB SSD.
- Select the rest of the cluster settings (more details in the Create Managed Kubernetes cluster instructions) and click Create.
- Connect to cluster.
Start the container
-
Download YAML file with deployment configuration.
-
Start the container:
kubectl apply -f selectel-ml.yaml
-
Check the status of the container:
kubectl get pod -w
-
Wait for the
Running
status — it means that the container is created and running:selectel-ml 1/1 Running
Launch Jupyter Notebook
-
Open a port to access the service:
kubectl expose deployment selectel-ml --type=LoadBalancer --name=my-service
-
Get the port to connect to the Jupyter server:
kubectl get services
Example answer:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
my-service LoadBalancer 10.100.90.86 203.0.113.1 8888:31779/TCP 30s -
In the address bar of your browser, enter the address from
EXTERNAL-IP
and the port number fromPORT(S)
, for example203.0.113.1:8888
. -
If
<pending>
is displayed inEXTERNAL-IP
, run thekubectl get services
command after a few minutes. -
In the Jupyter Notebook web interface that opens, enter the default password:
9lG0eXCevt
. -
Optional: Change the password according to Jupyter Notebook instructions.
Working with the container through the console
Configure the container operation through the console:
kubectl exec -it [pod name] /bin/bash