Aggregation Pipeline

The Aggregation Pipeline is MongoDB's framework for transforming and combining documents through a sequence of stages, conceptually similar to a SQL SELECT with GROUP BY, JOIN, and window functions. A pipeline is an array of stage operators applied in order; the output of one stage feeds the next.

Common stages

  • $match: filter documents (equivalent to WHERE)
  • $project: reshape and select fields (equivalent to SELECT)
  • $group: group and aggregate (equivalent to GROUP BY)
  • $sort: sort documents (equivalent to ORDER BY)
  • $limit, $skip: pagination
  • $lookup: left outer join to another collection
  • $unwind: explode array fields into one document per element
  • $facet: run multiple sub-pipelines in parallel on the same input
  • $graphLookup: recursive graph traversal
  • $merge, $out: write results to another collection

Performance considerations

  • Place $match and $project early to reduce documents and fields flowing downstream.
  • Use indexes to support early $match and $sort stages.
  • $lookup is expensive; pre-denormalising data sometimes beats join-style queries.
  • Use explain() to inspect the executed plan and see which stages used indexes.
🔗

Subscribe to Sahil's Playbook

Clear thinking on product, engineering, and building at scale. No noise. One email when there's something worth sharing.
[email protected]
Subscribe
Mastodon