Mixed Precision Training
Training neural networks using a combination of 16-bit and 32-bit floating-point numbers to speed up computation and reduce memory usage while maintaining model accuracy.
Why It Matters
Mixed precision training nearly doubles training speed and halves memory usage on modern GPUs, making it a standard practice for efficient model training.
Example
Using FP16 for forward and backward passes (fast, memory-efficient) but keeping a master copy of weights in FP32 (accurate) to prevent numerical instability.
Think of it like...
Like using a rough sketch for quick drafts but keeping a precise blueprint for final measurements — speed where it matters, precision where it counts.
Related Terms
GPU
Graphics Processing Unit — originally designed for rendering graphics, GPUs excel at the parallel mathematical operations needed for training and running AI models. They are the primary hardware for modern AI.
Quantization
The process of reducing the precision of a model's numerical weights (e.g., from 32-bit to 8-bit or 4-bit), making the model smaller and faster while accepting a small trade-off in accuracy.