Skip to main content

Performance metrics of the inference-service model

Last update:

With the help of metrics you can compare the performance of a model on different configurations and select the necessary one.Metrics are displayed in the card of each configuration when you create an inference service.

Avg Time to First Token

Average time from receiving a request to generating the first token in milliseconds

Avg Request Throughput

Average number of requests processed per second

Output Token Throughput

Average number of generated tokens per second

Request Latency

Average time from request receipt to complete response in seconds