Machine Learning

Pre-training

The initial phase of training a model on a large, general-purpose dataset before specializing it for specific tasks. Pre-training gives the model broad knowledge and capabilities.

Why It Matters

Pre-training is what makes foundation models powerful — the massive upfront investment creates a versatile base that can be adapted to countless downstream applications.

Example

Training GPT-4 on trillions of tokens of text from the internet, books, and code — giving it general language understanding before any task-specific fine-tuning.

Think of it like...

Like getting a broad liberal arts education before specializing in medical school — the general knowledge provides a foundation for deeper expertise.

Related Terms