IndexRAG: Bridging Facts for Cross-Document Reasoning at Index Time
Zhenghua Bao, Yi Shi

TL;DR
IndexRAG introduces an offline indexing method for multi-hop question answering that improves retrieval efficiency and accuracy by generating and indexing bridging facts, outperforming existing graph-based approaches.
Contribution
It shifts cross-document reasoning to offline indexing, enabling single-pass retrieval and improving multi-hop QA performance without additional online processing.
Findings
IndexRAG improves F1 by 4.6 points over Naive RAG.
It requires only single-pass retrieval and one LLM call.
Outperforms graph-based baselines when combined with IRCoT.
Abstract
Multi-hop question answering (QA) requires reasoning across multiple documents, yet existing retrieval-augmented generation (RAG) approaches address this either through graph-based methods requiring additional online processing or iterative multi-step reasoning. We present IndexRAG, a novel approach that shifts cross-document reasoning from online inference to offline indexing. IndexRAG identifies bridge entities shared across documents and generates bridging facts as independently retrievable units, requiring no additional training or fine-tuning. Experiments on three widely-used multi-hop QA benchmarks (HotpotQA, 2WikiMultiHopQA, MuSiQue) show that IndexRAG improves F1 over Naive RAG by 4.6 points on average, while requiring only single-pass retrieval and a single LLM call at inference time. When combined with IRCoT, IndexRAG outperforms all graph-based baselines on average, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
