Reranker

A reranker is a model that takes an initial set of retrieved candidates and re-orders them to improve precision. In a typical retrieval pipeline, a fast first-stage retriever returns 50 to 200 candidates, and a slower but more accurate reranker scores each (query, candidate) pair and returns the top 5 to 10.

How it works

Most rerankers are cross-encoders: a transformer model that takes the query and the candidate as a single concatenated input and outputs a relevance score. This differs from bi-encoders (used for embeddings), which encode query and document separately. Cross-encoding is more accurate because the model attends across the pair, but cannot be precomputed.

Common rerankers

  • Commercial APIs: Cohere Rerank, Voyage Rerank, Jina Reranker
  • Open source: BAAI bge-reranker, mxbai-rerank, MS MARCO MiniLM cross-encoders
  • LLM-as-reranker: Using an instruction-tuned LLM to score candidates with a prompt. Slower but flexible.
🔗

Subscribe to Sahil's Playbook

Clear thinking on product, engineering, and building at scale. No noise. One email when there's something worth sharing.
[email protected]
Subscribe
Mastodon