Illustration for Knowledge Distillation
Machine Learning

Knowledge Distillation

A model compression technique where a smaller 'student' model is trained to mimic the behavior of a larger 'teacher' model. The student learns not just correct answers but the teacher's nuanced probability distributions.

Why It Matters

Distillation lets you deploy AI at a fraction of the cost and latency. A distilled model can retain 90%+ of the teacher's capability at 10% the size.

Example

OpenAI's GPT-4o mini being a distilled version of GPT-4 — smaller, faster, and cheaper while retaining most of the larger model's capabilities.

Think of it like...

Like a senior mentor training a junior colleague — the junior person cannot replicate all the senior's experience but can learn their key decision-making patterns.

Related Terms