Embedding
A numerical representation of data (text, images, etc.) as a vector of numbers in a high-dimensional space. Similar items are placed closer together in this space, enabling machines to understand semantic relationships.
Why It Matters
Embeddings power semantic search, recommendation systems, and RAG pipelines. They are how AI understands meaning beyond simple keyword matching.
Example
The words 'king' and 'queen' would have similar embeddings (close together in vector space), while 'king' and 'bicycle' would be far apart.
Think of it like...
Like plotting cities on a map — cities that are culturally and geographically similar end up close together, making it easy to find related things by looking at what is nearby.
Related Terms
Vector Database
A specialized database designed to store, index, and search high-dimensional vector embeddings efficiently. It enables fast similarity searches across millions or billions of vectors.
Semantic Search
Search that understands the meaning and intent behind a query rather than just matching keywords. It uses embeddings to find results that are conceptually related even if they use different words.
Retrieval-Augmented Generation
A technique that enhances LLM outputs by first retrieving relevant information from external knowledge sources and then using that information as context for generation. RAG combines the power of search with the fluency of language models.
Cosine Similarity
A metric that measures the similarity between two vectors by calculating the cosine of the angle between them. Values range from -1 (opposite) to 1 (identical), with 0 meaning unrelated.