Self-Supervised Learning
A training approach where the model generates its own labels from the data, typically by masking or hiding parts of the input and learning to predict them. No human-annotated labels are needed.
Why It Matters
Self-supervised learning enabled the training of foundation models on internet-scale data. It eliminated the bottleneck of manual labeling for pre-training.
Example
BERT masking random words in sentences and learning to predict them, or GPT learning to predict the next word — both create their own training signal from raw text.
Think of it like...
Like a student who covers parts of a textbook page and quizzes themselves on the hidden content — they create their own learning exercises from the material.
Related Terms
Pre-training
The initial phase of training a model on a large, general-purpose dataset before specializing it for specific tasks. Pre-training gives the model broad knowledge and capabilities.
Masked Language Model
A training approach where random tokens in the input are replaced with a special [MASK] token and the model learns to predict the original tokens from context. This is how BERT was pre-trained.
Contrastive Learning
A self-supervised technique where the model learns by comparing similar (positive) and dissimilar (negative) pairs of examples. It learns representations where similar items are close and different items are far apart.
Foundation Model
A large AI model trained on broad data at scale that can be adapted to a wide range of downstream tasks. Foundation models serve as the base upon which specialized applications are built.
Unsupervised Learning
A type of machine learning where the model learns patterns from unlabeled data without being told what the correct output should be. The algorithm discovers hidden structures, groupings, or patterns in the data on its own.