Cross-Entropy
A loss function commonly used in classification tasks that measures the difference between the predicted probability distribution and the actual distribution. Lower cross-entropy means better predictions.
Why It Matters
Cross-entropy is the standard loss function for training classification models and language models. It directly optimizes the model to make more confident correct predictions.
Example
If the true class is 'cat' (probability 1.0 for cat, 0 for others) and the model predicts 0.7 for cat, cross-entropy penalizes the model for not being confident enough.
Think of it like...
Like a weather forecaster being scored — if they say 70% chance of rain and it rains, they did okay, but they would score better if they had said 95%.
Related Terms
Loss Function
A mathematical function that measures how far a model's predictions are from the actual correct values. The goal of training is to minimize this loss function, making predictions as accurate as possible.
Softmax
A function that converts a vector of numbers into a probability distribution, where each value is between 0 and 1 and all values sum to 1. It is typically used as the final layer in classification models.
Classification
A type of supervised learning task where the model predicts which category or class an input belongs to. The output is a discrete label rather than a continuous value.
Perplexity
A metric that measures how well a language model predicts text. Lower perplexity indicates the model is less 'surprised' by the text, meaning it can predict the next token more accurately.