Efficient Federated Search for Retrieval-Augmented Generation using Lightweight Routing
Akash Dhasade, Rachid Guerraoui, Anne-Marie Kermarrec, Diana Petrescu, Rafael Pires, Mathis Randl, Martijn de Vos

TL;DR
The paper presents RAGRoute, a lightweight neural routing method for federated search in retrieval-augmented generation systems, reducing communication and latency while maintaining retrieval accuracy.
Contribution
Introducing RAGRoute, a dynamic source selection mechanism that improves efficiency in federated RAG systems without sacrificing retrieval quality.
Findings
Achieves up to 80.65% reduction in communication volume.
Achieves up to 52.50% reduction in latency.
Maintains retrieval accuracy comparable to querying all sources.
Abstract
Large language models (LLMs) achieve remarkable performance across domains but remain prone to hallucinations and inconsistencies. Retrieval-augmented generation (RAG) mitigates these issues by augmenting model inputs with relevant documents retrieved from external sources. In many real-world scenarios, relevant knowledge is fragmented across organizations or institutions, motivating the need for federated search mechanisms that can aggregate results from heterogeneous data sources without centralizing the data. We introduce RAGRoute, a lightweight routing mechanism for federated search in RAG systems that dynamically selects relevant data sources at query time using a neural classifier, avoiding indiscriminate querying. This selective routing reduces communication overhead and end-to-end latency while preserving retrieval quality, achieving up to 80.65% reductions in communication…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
