AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning
Haotian Zhao, Songlin Zhou, Yuxin Zhang, Stephen S.-T. Yau, Wenyu Zhang, Lun Tian, Tianshu Zhu, Yifeng Huang, Yucheng Zeng, Jingnan Gu, Daxiang Dong, and Jianmin Wu

TL;DR
AEM introduces a supervision-free method that adaptively modulates entropy during RL training to enhance exploration and exploitation in multi-turn LLM agent tasks.
Contribution
It proposes a response-level entropy modulation technique that aligns uncertainty estimation with LLM action granularity, reducing reliance on dense supervision.
Findings
AEM improves RL performance across multiple benchmarks.
AEM achieves a +1.4% gain in software engineering RL tasks.
AEM effectively balances exploration and exploitation without additional supervision.
Abstract
Reinforcement learning (RL) has substantially improved the ability of large language model (LLM) agents to interact with environments and solve multi-turn tasks. However, effective agentic RL remains challenging: sparse outcome-only rewards provide limited guidance for assigning credit to individual steps within long interaction trajectories. Existing approaches often introduce dense intermediate supervision, such as process reward models or auxiliary self-supervised signals, which increases supervision and tuning complexity and may limit generalization across tasks and domains. We present AEM, a supervision-free credit assignment method that adaptively modulates entropy dynamics during RL training to improve the exploration-exploitation trade-off. Since in agentic RL the environment is typically affected by a complete response, rather than an individual token, our analysis lifts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
