Tackling the Inherent Difficulty of Noise Filtering in RAG
Jingyu Liu, Jiaen Lin, Yong Liu

TL;DR
This paper addresses the challenge of noise filtering in Retrieval-Augmented Generation by highlighting the inherent difficulty of filtering irrelevant documents and proposing a novel fine-tuning method to improve model robustness against noisy retrievals.
Contribution
We introduce a new fine-tuning approach that enhances LLMs' ability to distinguish relevant from irrelevant information in retrieved documents, improving robustness.
Findings
Significant performance gains across multiple benchmarks.
Standard fine-tuning is often ineffective against noisy retrievals.
Our method improves the model's ability to ignore irrelevant content.
Abstract
Retrieval-Augmented Generation (RAG) has become a widely adopted approach to enhance Large Language Models (LLMs) by incorporating external knowledge and reducing hallucinations. However, noisy or irrelevant documents are often introduced during RAG, potentially degrading performance and even causing hallucinated outputs. While various methods have been proposed to filter out such noise, we argue that identifying irrelevant information from retrieved content is inherently difficult and limited number of transformer layers can hardly solve this. Consequently, retrievers fail to filter out irrelevant documents entirely. Therefore, LLMs must be robust against such noise, but we demonstrate that standard fine-tuning approaches are often ineffective in enabling the model to selectively utilize relevant information while ignoring irrelevant content due to the structural constraints of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Information Retrieval and Search Behavior
