TL;DR
This paper investigates the distracting effect of irrelevant passages in Retrieval Augmented Generation (RAG), introduces a measure for this effect, and proposes methods to identify and leverage hard distracting passages to enhance LLM accuracy.
Contribution
It introduces a quantifiable measure of passage distraction, develops methods to find hard distracting passages, and improves RAG performance by fine-tuning with these passages.
Findings
Up to 7.5% increase in answering accuracy.
Robustness of the distracting effect across LLMs.
Novel framework for identifying and utilizing hard distracting passages.
Abstract
A well-known issue with Retrieval Augmented Generation (RAG) is that retrieved passages that are irrelevant to the query sometimes distract the answer-generating LLM, causing it to provide an incorrect response. In this paper, we shed light on this core issue and formulate the distracting effect of a passage w.r.t. a query (and an LLM). We provide a quantifiable measure of the distracting effect of a passage and demonstrate its robustness across LLMs. Our research introduces novel methods for identifying and using hard distracting passages to improve RAG systems. By fine-tuning LLMs with these carefully selected distracting passages, we achieve up to a 7.5% increase in answering accuracy compared to counterparts fine-tuned on conventional RAG datasets. Our contribution is two-fold: first, we move beyond the simple binary classification of irrelevant passages as either completely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Dropout · Layer Normalization · Byte Pair Encoding · Attention Dropout · Softmax · Residual Connection · WordPiece
