IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning

Zihan Liang; Yufei Ma; Ben Chen; Zhipeng Qian; Huangyu Dai; Lingtao Mao; Xuxin Zhang; Chenyi Lei; Wenwu Ou

arXiv:2604.15148·cs.AI·April 17, 2026

IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning

Zihan Liang, Yufei Ma, Ben Chen, Zhipeng Qian, Huangyu Dai, Lingtao Mao, Xuxin Zhang, Chenyi Lei, Wenwu Ou

PDF

TL;DR

IG-Search introduces a step-level reward based on Information Gain for reinforcement learning in search-augmented reasoning, enabling fine-grained credit assignment and improved performance on QA benchmarks.

Contribution

It proposes a novel IG-based step-level reward mechanism that does not require intermediate annotations, enhancing search query effectiveness in RL training of language models.

Findings

01

Outperforms trajectory-level baselines by 1.6 points on average across benchmarks.

02

Achieves an average EM of 0.430 with Qwen2.5-3B, especially benefiting multi-hop reasoning.

03

Adds only ~6.4% to training time without affecting inference latency.

Abstract

Reinforcement learning has emerged as an effective paradigm for training large language models to perform search-augmented reasoning. However, existing approaches rely on trajectory-level rewards that cannot distinguish precise search queries from vague or redundant ones within a rollout group, and collapse to a near-zero gradient signal whenever every sampled trajectory fails. In this paper, we propose IG-Search, a reinforcement learning framework that introduces a step-level reward based on Information Gain (IG). For each search step, IG measures how much the retrieved documents improve the model's confidence in the gold answer relative to a counterfactual baseline of random documents, thereby reflecting the effectiveness of the underlying search query. This signal is fed back to the corresponding search-query tokens via per-token advantage modulation in GRPO, enabling fine-grained,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.