Separate the Wheat from the Chaff: Winnowing Down Divergent Views in Retrieval Augmented Generation
Song Wang, Zihan Chen, Peng Wang, Zhepei Wei, Zhen Tan, Yu Meng, Cong Shen, Jundong Li

TL;DR
WinnowRAG is a novel retrieval-augmented generation framework that filters out noisy documents through a two-stage process involving clustering and iterative winnowing, significantly improving response accuracy without model fine-tuning.
Contribution
The paper introduces WinnowRAG, a model-agnostic, two-stage filtering approach that enhances RAG by systematically removing irrelevant documents to improve answer quality.
Findings
WinnowRAG outperforms state-of-the-art baselines on multiple datasets.
The framework effectively filters noise without model fine-tuning.
WinnowRAG improves the relevance and accuracy of generated responses.
Abstract
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by integrating external knowledge sources to address their limitations in accessing up-to-date or specialized information. A natural strategy to increase the likelihood of retrieving relevant information is to expand the number of retrieved documents. However, involving more documents could introduce significant noise, as many documents may be irrelevant or misleading, thereby reducing the overall accuracy of the generated responses. To overcome the challenge associated with handling a larger number of documents, we propose WinnowRAG, a novel RAG framework designed to systematically filter out noisy documents while preserving valuable content -- a process we refer to as winnowing. WinnowRAG operates in two stages: In Stage I, we perform query-aware clustering to group similar documents and form distinct topic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education
