Embeddings

Embeddings are dense vector representations of text, images, audio, or other data, learned so that semantically similar inputs end up near each other in vector space. They are the foundation of similarity search, recommendation systems, semantic clustering, and retrieval-augmented generation.

How they work

An embedding model maps an input (a sentence, an image, a chunk of code) to a fixed-length vector of floating-point numbers, typically 384, 768, 1024, or 1536 dimensions. The model is trained so that pairs with similar meaning produce vectors close to each other under cosine similarity or dot product, and pairs with unrelated meaning produce distant vectors.

At query time, the input embedding is compared against a corpus of stored embeddings using a nearest-neighbour search, typically accelerated by an approximate index such as HNSW, IVF, or DiskANN.

Common embedding models

  • OpenAI: text-embedding-3-small, text-embedding-3-large
  • Open source: BAAI bge family, E5, Nomic, GTE, Jina
  • Commercial: Cohere Embed, Voyage AI
  • Multimodal: CLIP (image and text), SigLIP, ImageBind
🔗

Subscribe to Sahil's Playbook

Clear thinking on product, engineering, and building at scale. No noise. One email when there's something worth sharing.
[email protected]
Subscribe
Mastodon