# MODE: Mixture of Document Experts for RAG

**Authors:** Rahul Anand

arXiv: 2509.00100 · 2025-09-03

## TL;DR

MODE introduces a simple, fast, and effective clustering-based retrieval method for RAG systems, ideal for small to medium corpora, matching or surpassing traditional dense retrieval in accuracy and speed.

## Contribution

The paper proposes a novel cluster-and-route retrieval approach that replaces complex vector search, improving efficiency and simplicity for domain-specific RAG applications.

## Key findings

- MODE achieves comparable or better answer quality than dense retrieval.
- It significantly reduces retrieval latency on small datasets.
- Tighter clusters enhance downstream accuracy.

## Abstract

Retrieval-Augmented Generation (RAG) often relies on large vector databases and cross-encoders tuned for large-scale corpora, which can be excessive for small, domain-specific collections. We present MODE (Mixture of Document Experts), a lightweight alternative that replaces fine-grained nearest-neighbor search with cluster-and-route retrieval. Documents are embedded, grouped into semantically coherent clusters, and represented by cached centroids. At query time, we route to the top centroid(s) and retrieve context only within those clusters, eliminating external vector-database infrastructure and reranking while keeping latency low. On HotpotQA and SQuAD corpora with 100-500 chunks, MODE matches or exceeds a dense-retrieval baseline in answer quality while reducing end-to-end retrieval time. Ablations show that cluster granularity and multi-cluster routing control the recall/precision trade-off, and that tighter clusters improve downstream accuracy. MODE offers a practical recipe for small and medium corpora where simplicity, speed, and topical focus matter.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00100/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00100/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/2509.00100/full.md

---
Source: https://tomesphere.com/paper/2509.00100