Noise-Robust Abstractive Compression in Retrieval-Augmented Language Models
Singon Kim

TL;DR
This paper introduces ACoRN, a noise-robust abstractive compression method for retrieval-augmented models, improving answer accuracy by handling irrelevant or misleading retrieved documents through novel training strategies.
Contribution
We propose ACoRN, a new training framework that enhances the robustness of language model compressors against retrieval noise, improving factual accuracy in long-context summarization.
Findings
ACoRN improves EM and F1 scores on relevant datasets.
ACoRN maintains answer string fidelity, providing direct evidence.
The method is especially effective with noisy retrieval scenarios.
Abstract
Abstractive compression utilizes smaller langauge models to condense query-relevant context, reducing computational costs in retrieval-augmented generation (RAG). However, retrieved documents often include information that is either irrelevant to answering the query or misleading due to factual incorrect content, despite having high relevance scores. This behavior indicates that abstractive compressors are more likely to omit important information essential for the correct answer, especially in long contexts where attention dispersion occurs. To address this issue, we categorize retrieved documents in a more fine-grained manner and propose Abstractive Compression Robust against Noise (ACoRN), which introduces two novel training steps. First, we use offline data augmentation on the training dataset to enhance compressor robustness against two distinct types of retrieval noise. Second,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Information Retrieval and Search Behavior · Natural Language Processing Techniques
