Pre-training
The initial phase of training a model on a large, general-purpose dataset before specializing it for specific tasks. Pre-training gives the model broad knowledge and capabilities.
Why It Matters
Pre-training is what makes foundation models powerful — the massive upfront investment creates a versatile base that can be adapted to countless downstream applications.
Example
Training GPT-4 on trillions of tokens of text from the internet, books, and code — giving it general language understanding before any task-specific fine-tuning.
Think of it like...
Like getting a broad liberal arts education before specializing in medical school — the general knowledge provides a foundation for deeper expertise.
Related Terms
Fine-Tuning
The process of taking a pre-trained model and further training it on a smaller, domain-specific dataset to specialize its behavior for a particular task or domain. Fine-tuning adjusts the model's weights to improve performance on the target task.
Transfer Learning
A technique where a model trained on one task is repurposed as the starting point for a model on a different but related task. Instead of training from scratch, you leverage knowledge the model has already acquired.
Foundation Model
A large AI model trained on broad data at scale that can be adapted to a wide range of downstream tasks. Foundation models serve as the base upon which specialized applications are built.
Self-Supervised Learning
A training approach where the model generates its own labels from the data, typically by masking or hiding parts of the input and learning to predict them. No human-annotated labels are needed.
Masked Language Model
A training approach where random tokens in the input are replaced with a special [MASK] token and the model learns to predict the original tokens from context. This is how BERT was pre-trained.