Quantization-Aware Training
Training a model while simulating the effects of quantization, so the model learns to maintain accuracy even when weights are later reduced to lower precision.
Why It Matters
QAT produces quantized models with minimal accuracy loss — much better than quantizing after training, especially for aggressive compression (4-bit, 2-bit).
Example
Training a model where forward passes simulate INT8 precision, teaching the model to maintain accuracy within the constraints of reduced precision from the start.
Think of it like...
Like a musician who practices on a small stage before performing there — they learn to adapt their performance to the constraints rather than being surprised by them.
Related Terms
Quantization
The process of reducing the precision of a model's numerical weights (e.g., from 32-bit to 8-bit or 4-bit), making the model smaller and faster while accepting a small trade-off in accuracy.
Edge Inference
Running AI models directly on local devices (phones, IoT sensors, cameras) rather than sending data to the cloud. This reduces latency, preserves privacy, and works without internet connectivity.