Performance metrics of the inference-service model

With the help of metrics you can compare the performance of a model on different configurations and select the necessary one.Metrics are displayed in the card of each configuration when you create an inference service.

Avg Time to First Token	Average time from receiving a request to generating the first token in milliseconds
Avg Request Throughput	Average number of requests processed per second
Output Token Throughput	Average number of generated tokens per second
Request Latency	Average time from request receipt to complete response in seconds