InfoFlow: Reinforcing Search Agent Via Reward Density Optimization

Kun Luo; Hongjin Qian; Zheng Liu; Ziyi Xia; Shitao Xiao; Siqi Bao; Jun Zhao; Kang Liu

arXiv:2510.26575·cs.CL·October 31, 2025

InfoFlow: Reinforcing Search Agent Via Reward Density Optimization

Kun Luo, Hongjin Qian, Zheng Liu, Ziyi Xia, Shitao Xiao, Siqi Bao, Jun Zhao, Kang Liu

PDF

TL;DR

This paper introduces InfoFlow, a framework that enhances deep search reinforcement learning by optimizing reward density through task decomposition, guidance, and dual-agent refinement, leading to improved efficiency and performance.

Contribution

The paper proposes a novel systematic framework, InfoFlow, that addresses reward density issues in deep search RL via task decomposition, failure guidance, and dual-agent architecture.

Findings

01

Significantly outperforms baselines on multiple benchmarks.

02

Enables lightweight LLMs to match advanced proprietary LLMs.

03

Improves exploration efficiency and reward acquisition.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) is a promising approach for enhancing agentic deep search. However, its application is often hindered by low \textbf{Reward Density} in deep search scenarios, where agents expend significant exploratory costs for infrequent and often null final rewards. In this paper, we formalize this challenge as the \textbf{Reward Density Optimization} problem, which aims to improve the reward obtained per unit of exploration cost. This paper introduce \textbf{InfoFlow}, a systematic framework that tackles this problem from three aspects. 1) \textbf{Subproblem decomposition}: breaking down long-range tasks to assign process rewards, thereby providing denser learning signals. 2) \textbf{Failure-guided hints}: injecting corrective guidance into stalled trajectories to increase the probability of successful outcomes. 3) \textbf{Dual-agent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.