Momentum
An optimization technique that accelerates gradient descent by accumulating a velocity vector in the direction of persistent gradients, helping overcome local minima and noisy gradients.
Why It Matters
Momentum helps models train faster and more stably by smoothing out the optimization path and building speed in consistent gradient directions.
Example
Without momentum, gradient descent oscillates in narrow valleys. With momentum, it builds speed in the downhill direction and dampens oscillations.
Think of it like...
Like a bowling ball rolling downhill — it builds up speed in a consistent direction and is not easily knocked off course by small bumps.
Related Terms
Gradient Descent
An optimization algorithm used to minimize the error (loss) of a model by iteratively adjusting parameters in the direction that reduces the loss most quickly. It is the primary method for training machine learning models.
Stochastic Gradient Descent
A variant of gradient descent that updates model parameters using a single random training example (or small batch) at each step instead of the entire dataset. It is faster and can escape local minima.
Adam Optimizer
An adaptive optimization algorithm that combines momentum and adaptive learning rates for each parameter. Adam maintains running averages of both gradients and squared gradients.
Learning Rate
A hyperparameter that controls how much the model's weights are adjusted in response to errors during each training step. It determines the size of the steps taken during gradient descent optimization.