Skip to main content

Performance metrics of the inference-service model

Last update:

With the help of metrics you can evaluate the model performance on different configurations and select the necessary configuration. The expected model performance metrics are displayed in each configuration card when you create an inference service.

Avg Time to First Token

Average time from receiving a request to generating the first token in milliseconds

Avg Request Throughput

Average number of requests processed per second

Output Token Throughput

Average number of generated tokens per second

Request Latency

Average time from request receipt to complete response in seconds