Machine Learning

Mixed Precision Training

Training neural networks using a combination of 16-bit and 32-bit floating-point numbers to speed up computation and reduce memory usage while maintaining model accuracy.

Why It Matters

Mixed precision training nearly doubles training speed and halves memory usage on modern GPUs, making it a standard practice for efficient model training.

Example

Using FP16 for forward and backward passes (fast, memory-efficient) but keeping a master copy of weights in FP32 (accurate) to prevent numerical instability.

Think of it like...

Like using a rough sketch for quick drafts but keeping a precise blueprint for final measurements — speed where it matters, precision where it counts.

Related Terms

GPU

Graphics Processing Unit — originally designed for rendering graphics, GPUs excel at the parallel mathematical operations needed for training and running AI models. They are the primary hardware for modern AI.

Quantization

The process of reducing the precision of a model's numerical weights (e.g., from 32-bit to 8-bit or 4-bit), making the model smaller and faster while accepting a small trade-off in accuracy.

Back to Glossary