Vector Database

A vector database is a storage system optimised for similarity search over high-dimensional vectors, typically embeddings. Given a query vector, it returns the closest vectors in the corpus using an approximate nearest-neighbour index, alongside any associated metadata.

How it works

Vector databases index embeddings using approximate nearest-neighbour (ANN) algorithms such as HNSW, IVF, ScaNN, or DiskANN. These trade a small amount of recall for orders-of-magnitude better latency than a brute-force scan. Most also support hybrid search, combining vector similarity with traditional keyword filters or BM25 scoring.

Common products

  • Managed: Pinecone, Weaviate Cloud, Qdrant Cloud, Vespa Cloud, Turbopuffer
  • Self-hosted: Weaviate, Qdrant, Milvus, Vespa, Marqo
  • Embedded: Chroma, LanceDB, FAISS (library, not a database)
  • Relational extensions: pgvector (PostgreSQL), MongoDB Atlas Vector Search, Redis Vector Search, Elasticsearch dense_vector
🔗

Subscribe to Sahil's Playbook

Clear thinking on product, engineering, and building at scale. No noise. One email when there's something worth sharing.
[email protected]
Subscribe
Mastodon