Latency

Level 2

Short Description

The time it takes for a model to produce a response after receiving a request.

Friendly Description: Latency is just the wait time, the gap between when you ask the AI something and when it starts answering. Low latency feels snappy, like a friend who replies right away. High latency feels sluggish, like waiting for someone to look up from their book. Smaller, more efficient models usually have lower latency, while bigger, smarter models can take a moment.

Example: When you tap a voice assistant on your phone and ask a quick question, you expect the answer in well under a second. That snappy feel comes from very low latency, often achieved by running smaller models close to the device instead of out on a distant server.