Semantic Similarity
A measure of how similar in meaning two pieces of text are, regardless of the specific words used. Semantic similarity captures conceptual relatedness rather than lexical overlap.
Why It Matters
Semantic similarity powers deduplication, plagiarism detection, FAQ matching, and RAG retrieval. It understands meaning, not just surface form.
Example
The sentences 'The cat is sleeping' and 'The feline is resting' have high semantic similarity despite sharing zero content words.
Think of it like...
Like recognizing that a suit and a tuxedo are similar despite looking different — they serve the same purpose and share the same essential qualities.
Related Terms
Cosine Similarity
A metric that measures the similarity between two vectors by calculating the cosine of the angle between them. Values range from -1 (opposite) to 1 (identical), with 0 meaning unrelated.
Embedding
A numerical representation of data (text, images, etc.) as a vector of numbers in a high-dimensional space. Similar items are placed closer together in this space, enabling machines to understand semantic relationships.
Semantic Search
Search that understands the meaning and intent behind a query rather than just matching keywords. It uses embeddings to find results that are conceptually related even if they use different words.
Natural Language Understanding
The ability of an AI system to comprehend the meaning, intent, and context of human language, going beyond surface-level word matching to grasp semantics, pragmatics, and implied meaning.