Artificial Intelligence

Prompt Caching

A technique that stores and reuses the processed form of frequently used prompt prefixes, avoiding redundant computation. It speeds up inference and reduces costs for repeated prompts.

Why It Matters

Prompt caching can reduce API costs by 50-90% for applications with long, shared system prompts — like RAG systems that include the same context repeatedly.

Example

A customer support bot with a 2,000-token system prompt that is cached after the first call — subsequent calls skip processing those tokens, saving time and money.

Think of it like...

Like keeping your frequently used ingredients pre-chopped in the fridge — you do not re-prepare them every time you cook, saving time on every meal.

Prompt Caching

Why It Matters

Example

Think of it like...

Related Terms

Inference

Latency

System Prompt

API