F1 Score
The harmonic mean of precision and recall, providing a single metric that balances both. F1 scores range from 0 to 1, with 1 being perfect precision and recall.
Why It Matters
F1 score is the go-to metric when you need to balance finding all relevant items with avoiding false alarms. It is the most balanced single evaluation metric.
Example
A model with 0.80 precision and 0.90 recall has an F1 score of 0.847 — a single number that captures performance from both angles.
Think of it like...
Like a school grade that equally weights attendance and test scores — you need both to score well, and being terrible at one drags down the whole grade.
Related Terms
Precision
Of all the items the model predicted as positive, the proportion that were actually positive. Precision measures how trustworthy the model's positive predictions are.
Recall
Of all the actually positive items in the dataset, the proportion that the model correctly identified. Recall measures how completely the model finds all relevant items.
Accuracy
The percentage of correct predictions out of all predictions made by a model. While intuitive, accuracy can be misleading for imbalanced datasets.
Evaluation
The systematic process of measuring an AI model's performance, safety, and reliability using various metrics, benchmarks, and testing methodologies.
Confusion Matrix
A table that summarizes the performance of a classification model by showing true positives, true negatives, false positives, and false negatives. It reveals the types of errors a model makes.