Artificial Intelligence

KV Cache

Key-Value Cache — a mechanism that stores previously computed attention key and value vectors during autoregressive generation, avoiding redundant computation for tokens already processed.

Why It Matters

KV caching is essential for efficient LLM inference. Without it, every new token would require recomputing attention over the entire sequence from scratch.

Example

After generating 100 tokens, the KV cache stores the key-value pairs for all 100 tokens so generating token 101 only needs to compute attention for the new token.

Think of it like...

Like taking notes during a conversation — instead of replaying the entire conversation to answer each new question, you just reference your notes.

KV Cache

Why It Matters

Example

Think of it like...

Related Terms

Inference

Latency

Context Window

Transformer