IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning
Zihan Liang, Yufei Ma, Ben Chen, Zhipeng Qian, Huangyu Dai, Lingtao Mao, Xuxin Zhang, Chenyi Lei, Wenwu Ou

TL;DR
IG-Search introduces a step-level reward based on Information Gain for reinforcement learning in search-augmented reasoning, enabling fine-grained credit assignment and improved performance on QA benchmarks.
Contribution
It proposes a novel IG-based step-level reward mechanism that does not require intermediate annotations, enhancing search query effectiveness in RL training of language models.
Findings
Outperforms trajectory-level baselines by 1.6 points on average across benchmarks.
Achieves an average EM of 0.430 with Qwen2.5-3B, especially benefiting multi-hop reasoning.
Adds only ~6.4% to training time without affecting inference latency.
Abstract
Reinforcement learning has emerged as an effective paradigm for training large language models to perform search-augmented reasoning. However, existing approaches rely on trajectory-level rewards that cannot distinguish precise search queries from vague or redundant ones within a rollout group, and collapse to a near-zero gradient signal whenever every sampled trajectory fails. In this paper, we propose IG-Search, a reinforcement learning framework that introduces a step-level reward based on Information Gain (IG). For each search step, IG measures how much the retrieved documents improve the model's confidence in the gold answer relative to a counterfactual baseline of random documents, thereby reflecting the effectiveness of the underlying search query. This signal is fed back to the corresponding search-query tokens via per-token advantage modulation in GRPO, enabling fine-grained,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
