Noise
Random variation or errors in data that do not represent true underlying patterns. In deep learning, noise can also refer to the random input used in generative models.
Why It Matters
Understanding noise is critical — models should learn signal (true patterns) not noise (random variation). Overfitting often means the model learned the noise.
Example
A dataset of house prices where some entries have typos ($50,000 instead of $500,000) or where random factors cause prices to vary from the true market value.
Think of it like...
Like static on a radio — the music (signal) is there, but random interference (noise) makes it harder to hear clearly.
Related Terms
Overfitting
When a model learns the training data too well — including its noise and random fluctuations — and performs poorly on new, unseen data. The model essentially memorizes rather than generalizes.
Regularization
Techniques used to prevent overfitting by adding constraints or penalties to the model during training. Regularization discourages the model from becoming too complex or fitting noise in the training data.
Data Quality
The degree to which data is accurate, complete, consistent, timely, and fit for its intended use. Data quality directly impacts the reliability and performance of AI models.
Denoising
The process of removing noise from data to recover the underlying clean signal. In generative AI, denoising is the core mechanism of diffusion models.