Learning Rate
A hyperparameter that controls how much the model's weights are adjusted in response to errors during each training step. It determines the size of the steps taken during gradient descent optimization.
Why It Matters
The learning rate is often the single most important hyperparameter to tune. Too high and training diverges; too low and it takes forever or gets stuck.
Example
Setting a learning rate of 0.001 means each training step makes small, cautious adjustments to the model weights.
Think of it like...
Like the volume knob on learning from mistakes — turn it too high and you overreact to every error, too low and you barely change, finding the sweet spot is crucial.
Related Terms
Gradient Descent
An optimization algorithm used to minimize the error (loss) of a model by iteratively adjusting parameters in the direction that reduces the loss most quickly. It is the primary method for training machine learning models.
Hyperparameter
Settings that are configured before training begins and control how the model learns, as opposed to parameters which are learned during training. Examples include learning rate, batch size, and number of layers.
Adam Optimizer
An adaptive optimization algorithm that combines momentum and adaptive learning rates for each parameter. Adam maintains running averages of both gradients and squared gradients.