Machine Learning

Layer Normalization

A normalization technique that normalizes the inputs across the features for each individual example (rather than across the batch). It stabilizes training in transformers and RNNs.

Why It Matters

Layer normalization is a critical component in transformers. Unlike batch normalization, it works the same during training and inference, simplifying deployment.

Example

After each transformer layer computes its output, layer normalization adjusts the values to have zero mean and unit variance across the feature dimension.

Think of it like...

Like a sound engineer who normalizes audio levels for each track independently, ensuring consistent volume regardless of the source.

Related Terms