Generalization
A model's ability to perform well on new, unseen data that was not part of its training set. Generalization is the ultimate goal of machine learning — learning patterns, not memorizing examples.
Why It Matters
A model that does not generalize is useless in production. Generalization is what separates a useful AI system from an expensive lookup table.
Example
A spam classifier trained on 2024 emails still correctly identifying spam in 2025 despite evolving spam tactics — it learned the underlying patterns, not specific examples.
Think of it like...
Like learning to drive in one city and being able to drive competently in any city — you learned the principles of driving, not just one specific route.
Related Terms
Overfitting
When a model learns the training data too well — including its noise and random fluctuations — and performs poorly on new, unseen data. The model essentially memorizes rather than generalizes.
Training Data
The dataset used to teach a machine learning model. It contains examples (and often labels) that the model learns patterns from during the training process. The quality and quantity of training data directly impact model performance.
Test Data
A separate portion of data held back from training that is used to evaluate a model's performance on unseen examples. Test data provides an unbiased estimate of how well the model will perform in the real world.
Regularization
Techniques used to prevent overfitting by adding constraints or penalties to the model during training. Regularization discourages the model from becoming too complex or fitting noise in the training data.
Cross-Validation
A model evaluation technique that splits data into multiple folds, trains on some folds and tests on the held-out fold, repeating so every fold serves as the test set. It provides a robust estimate of model performance.