Noise-Robust Abstractive Compression in Retrieval-Augmented Language Models

Singon Kim

arXiv:2512.08943·cs.CL·December 11, 2025

Noise-Robust Abstractive Compression in Retrieval-Augmented Language Models

Singon Kim

PDF

Open Access

TL;DR

This paper introduces ACoRN, a noise-robust abstractive compression method for retrieval-augmented models, improving answer accuracy by handling irrelevant or misleading retrieved documents through novel training strategies.

Contribution

We propose ACoRN, a new training framework that enhances the robustness of language model compressors against retrieval noise, improving factual accuracy in long-context summarization.

Findings

01

ACoRN improves EM and F1 scores on relevant datasets.

02

ACoRN maintains answer string fidelity, providing direct evidence.

03

The method is especially effective with noisy retrieval scenarios.

Abstract

Abstractive compression utilizes smaller langauge models to condense query-relevant context, reducing computational costs in retrieval-augmented generation (RAG). However, retrieved documents often include information that is either irrelevant to answering the query or misleading due to factual incorrect content, despite having high relevance scores. This behavior indicates that abstractive compressors are more likely to omit important information essential for the correct answer, especially in long contexts where attention dispersion occurs. To address this issue, we categorize retrieved documents in a more fine-grained manner and propose Abstractive Compression Robust against Noise (ACoRN), which introduces two novel training steps. First, we use offline data augmentation on the training dataset to enhance compressor robustness against two distinct types of retrieval noise. Second,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Natural Language Processing Techniques