Machine Learning

TF-IDF

Term Frequency-Inverse Document Frequency — a statistical measure that evaluates how important a word is to a document within a collection. Words frequent in one document but rare across documents score high.

Why It Matters

TF-IDF is foundational to information retrieval and text analysis. Understanding it explains why search engines do not just count word occurrences.

Example

The word 'neural' appearing 20 times in an AI paper scores high (frequent in this doc, rare overall), while 'the' appearing 50 times scores low (common everywhere).

Think of it like...

Like judging a chef's specialty — if they make pasta every day (high frequency) and no other chef in town does (rare), pasta is clearly their defining dish.

TF-IDF

Why It Matters

Example

Think of it like...

Related Terms

BM25

Text Mining

Natural Language Processing

Feature Engineering