Less is More for RAG: Information Gain Pruning for Generator-Aligned Reranking and Evidence Selection

Zhipeng Song; Yizhi Zhou; Xiangyu Kong; Jiulong Jiao; Xinrui Bao; Xu You; Xueqing Shi; Yuhang Zhou; Heng Qi

arXiv:2601.17532·cs.CL·January 27, 2026

Less is More for RAG: Information Gain Pruning for Generator-Aligned Reranking and Evidence Selection

Zhipeng Song, Yizhi Zhou, Xiangyu Kong, Jiulong Jiao, Xinrui Bao, Xu You, Xueqing Shi, Yuhang Zhou, Heng Qi

PDF

Open Access

TL;DR

This paper introduces Information Gain Pruning (IGP), a method for selecting and filtering evidence in retrieval-augmented generation to improve answer quality and efficiency, especially under limited context budgets.

Contribution

The paper proposes IGP, a novel reranking and pruning module that aligns evidence selection with generator utility without altering existing retrieval budgets.

Findings

01

IGP improves QA quality-cost trade-off across multiple benchmarks.

02

IGP achieves 12-20% relative F1 improvement in multi-evidence settings.

03

IGP reduces input tokens by approximately 76-79% compared to baselines.

Abstract

Retrieval-augmented generation (RAG) grounds large language models with external evidence, but under a limited context budget, the key challenge is deciding which retrieved passages should be injected. We show that retrieval relevance metrics (e.g., NDCG) correlate weakly with end-to-end QA quality and can even become negatively correlated under multi-passage injection, where redundancy and mild conflicts destabilize generation. We propose \textbf{Information Gain Pruning (IGP)}, a deployment-friendly reranking-and-pruning module that selects evidence using a generator-aligned utility signal and filters weak or harmful passages before truncation, without changing existing budget interfaces. Across five open-domain QA benchmarks and multiple retrievers and generators, IGP consistently improves the quality--cost trade-off. In a representative multi-evidence setting, IGP delivers about…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Multimodal Machine Learning Applications