DF-RAG: Query-Aware Diversity for Retrieval-Augmented Generation
Saadat Hasan Khan, Spencer Hong, Jingyu Wu, Kevin Lybarger, Youbing Yin, Erin Babinsky, Daben Liu

TL;DR
DF-RAG enhances retrieval-augmented generation by dynamically incorporating diversity into retrieval, significantly improving performance on complex reasoning questions without extra training.
Contribution
Introduces a novel method that optimizes diversity in retrieval for RAG, improving reasoning-intensive QA performance without additional fine-tuning.
Findings
DF-RAG improves F1 scores by 4-10% over vanilla RAG.
DF-RAG captures up to 91.3% of the estimated Oracle ceiling.
Outperforms established baselines on reasoning-intensive benchmarks.
Abstract
Retrieval-augmented generation (RAG) is a common technique for grounding language model outputs in domain-specific information. However, RAG is often challenged by reasoning-intensive question-answering (QA), since common retrieval methods like cosine similarity maximize relevance at the cost of introducing redundant content, which can reduce information recall. To address this, we introduce Diversity-Focused Retrieval-Augmented Generation (DF-RAG), which systematically incorporates diversity into the retrieval step to improve performance on complex, reasoning-intensive QA benchmarks. DF-RAG builds upon the Maximal Marginal Relevance framework to select information chunks that are both relevant to the query and maximally dissimilar from each other. A key innovation of DF-RAG is its ability to optimize the level of diversity for each query dynamically at test time without requiring any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Natural Language Processing Techniques
