Artificial Intelligence

Latency

The time delay between sending a request to an AI model and receiving the response. In ML systems, latency includes data preprocessing, model inference, and network transmission time.

Why It Matters

Latency determines user experience — a chatbot with 10-second response times feels broken, while one with 200ms feels instant. It is a critical production metric.

Example

An LLM API call taking 800ms to return the first token (time-to-first-token latency) and 3 seconds to generate the complete response.

Think of it like...

Like the wait time at a restaurant — from when you place your order to when food arrives. Some dishes (complex queries) naturally take longer than others.

Latency

Why It Matters

Example

Think of it like...

Related Terms

Throughput

Model Serving

Inference