Foundation Models Catalog Payment Model and Prices
Balance
A single balance or cloud platform balance is used to pay for cloud platform resources depending on the type of balance in the account.
You can pay for resources with different types of funds: fixed assets, bonuses.
Top up your balance before paying.
Payment Model
The cloud platform uses a pay-as-you-go payment model. The balance is debited every hour for the previous hour of cloud platform resources usage.
Payment for inference-service resources is formed by projects. Each project contains a group of resources: number of GPU, vCPU, RAM, disk volume.
Every astronomical hour, the cost of the resource group is updated.
All resources of the cloud platform for Foundation Models Catalog are quota-based. For quota resources, the maximum consumption per hour is taken into account.
For example, an inference service with two inference-instances is created at 13:25. The configuration of each inference-instance is: 24 vCPUs, 160 GB RAM, 1 GPU, 300 GB disk. At 13:40 inference-service scaled - reduced the number of inference-instances to one. The bill for the hour 13:00-14:00 will take into account the consumption of 48 vCPUs, 320 GB RAM, 2 GPUs, 600 GB disk. For the hour 14:00-15:00 the consumption of only 24 vCPUs, 160 GB RAM, 1 GPU, 300 GB disk will be counted if the number of inference-instances was not increased or other inference-services were not created.
If a project has increased the number of inference services or the number of inference instances in an inference service, the resource charges will change immediately.
For example, an inference service with one inference-instance is created at 13:25. Configuration of the inference-instance: 24 vCPUs, 160 GB RAM, 1 GPU, 300 GB disk. At 13:40 inference-service scaled up - increased the number of inference-instances from one to two. In the hour 13:00-14:00, 48 vCPUs, 320 GB RAM, 2 GPUs, 600 GB disk will be charged.
Blocking resources if there are insufficient funds in the balance
If there are insufficient funds on your balance at the time of debiting, all resources of the cloud platform will be automatically blocked - and you will continue to be charged for them.
To restore access to resources, you need to top up your balance by the amount of the debt within 14 days after the blocking. The debt for resources that were charged during the blocking period will be automatically repaid. Projects are not blocked - you can delete the entire project or resources via API.
If you do not top up the balance for the amount of the debt within 14 days after the blocking, all resources of the cloud platform will be deleted. Projects will not be deleted.
You can set up balance notifications and auto-replenishment to ensure that you always have enough money on your balance.
Prices
The cost depends on the selected inference service configuration. When using inference service, you pay only for cloud platform resources: GPU, vCPU, RAM, disk space. The cost does not depend on the number of tokens.
You can view resource prices when creating an inference service in the control panel.
Reporting documents
Reporting documents are available after payment.