Machine Learning

Quantization-Aware Training

Training a model while simulating the effects of quantization, so the model learns to maintain accuracy even when weights are later reduced to lower precision.

Why It Matters

QAT produces quantized models with minimal accuracy loss — much better than quantizing after training, especially for aggressive compression (4-bit, 2-bit).

Example

Training a model where forward passes simulate INT8 precision, teaching the model to maintain accuracy within the constraints of reduced precision from the start.

Think of it like...

Like a musician who practices on a small stage before performing there — they learn to adapt their performance to the constraints rather than being surprised by them.

Related Terms

Quantization

The process of reducing the precision of a model's numerical weights (e.g., from 32-bit to 8-bit or 4-bit), making the model smaller and faster while accepting a small trade-off in accuracy.

Edge Inference

Running AI models directly on local devices (phones, IoT sensors, cameras) rather than sending data to the cloud. This reduces latency, preserves privacy, and works without internet connectivity.

Back to Glossary