Sentence Transformers
A framework for computing dense vector representations (embeddings) for sentences and paragraphs. Built on top of transformer models and optimized for semantic similarity tasks.
Why It Matters
Sentence Transformers are the most practical way to create text embeddings for search, clustering, and RAG. They bridge the gap between raw transformers and usable embeddings.
Example
Using the all-MiniLM-L6-v2 model to embed 1 million FAQ entries, enabling semantic search that finds the right answer even when users phrase questions differently.
Think of it like...
Like a universal translator for meaning — it converts any sentence into a standardized numerical fingerprint that captures its meaning.
Related Terms
Embedding
A numerical representation of data (text, images, etc.) as a vector of numbers in a high-dimensional space. Similar items are placed closer together in this space, enabling machines to understand semantic relationships.
Bi-Encoder
A model that independently encodes two texts into separate vectors, then compares them using a similarity metric like cosine similarity. Bi-encoders are fast because vectors can be pre-computed.
Semantic Similarity
A measure of how similar in meaning two pieces of text are, regardless of the specific words used. Semantic similarity captures conceptual relatedness rather than lexical overlap.
Vector Database
A specialized database designed to store, index, and search high-dimensional vector embeddings efficiently. It enables fast similarity searches across millions or billions of vectors.
Hugging Face
The leading open-source platform for sharing and discovering AI models, datasets, and applications. Hugging Face hosts the Transformers library and a community hub with thousands of pre-trained models.