InfoPO: Information-Driven Policy Optimization for User-Centric Agents

Fanqi Kong; Jiayi Zhang; Mingyi Deng; Chenglin Wu; Yuyu Luo; Bang Liu

arXiv:2603.00656·cs.AI·March 3, 2026

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

Fanqi Kong, Jiayi Zhang, Mingyi Deng, Chenglin Wu, Yuyu Luo, Bang Liu

PDF

Open Access

TL;DR

InfoPO introduces an information-driven policy optimization method that enhances multi-turn interactions in user-centric agents by focusing on valuable information gain, leading to improved performance across diverse tasks.

Contribution

It proposes a novel active uncertainty reduction framework with an information-gain reward for better multi-turn policy learning in user-centric agents.

Findings

01

Outperforms prompting and RL baselines across tasks.

02

Demonstrates robustness to user simulator shifts.

03

Generalizes well to environment-interactive tasks.

Abstract

Real-world user requests to LLM agents are often underspecified. Agents must interact to acquire missing information and make correct downstream decisions. However, current multi-turn GRPO-based methods often rely on trajectory-level reward computation, which leads to credit assignment problems and insufficient advantage signals within rollout groups. A feasible approach is to identify valuable interaction turns at a fine granularity to drive more targeted learning. To address this, we introduce InfoPO (Information-Driven Policy Optimization), which frames multi-turn interaction as a process of active uncertainty reduction and computes an information-gain reward that credits turns whose feedback measurably changes the agent's subsequent action distribution compared to a masked-feedback counterfactual. It then combines this signal with task outcomes via an adaptive variance-gated fusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Multimodal Machine Learning Applications