AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning

Haotian Zhao; Songlin Zhou; Yuxin Zhang; Stephen S.-T. Yau; Wenyu Zhang; Lun Tian; Tianshu Zhu; Yifeng Huang; Yucheng Zeng; Jingnan Gu; Daxiang Dong; and Jianmin Wu

arXiv:2605.00425·cs.AI·May 11, 2026

AEM: Adaptive Entropy Modulation for Multi-Turn Agentic Reinforcement Learning

Haotian Zhao, Songlin Zhou, Yuxin Zhang, Stephen S.-T. Yau, Wenyu Zhang, Lun Tian, Tianshu Zhu, Yifeng Huang, Yucheng Zeng, Jingnan Gu, Daxiang Dong, and Jianmin Wu

PDF

TL;DR

AEM introduces a supervision-free method that adaptively modulates entropy during RL training to enhance exploration and exploitation in multi-turn LLM agent tasks.

Contribution

It proposes a response-level entropy modulation technique that aligns uncertainty estimation with LLM action granularity, reducing reliance on dense supervision.

Findings

01

AEM improves RL performance across multiple benchmarks.

02

AEM achieves a +1.4% gain in software engineering RL tasks.

03

AEM effectively balances exploration and exploitation without additional supervision.

Abstract

Reinforcement learning (RL) has substantially improved the ability of large language model (LLM) agents to interact with environments and solve multi-turn tasks. However, effective agentic RL remains challenging: sparse outcome-only rewards provide limited guidance for assigning credit to individual steps within long interaction trajectories. Existing approaches often introduce dense intermediate supervision, such as process reward models or auxiliary self-supervised signals, which increases supervision and tuning complexity and may limit generalization across tasks and domains. We present AEM, a supervision-free credit assignment method that adaptively modulates entropy dynamics during RL training to improve the exploration-exploitation trade-off. Since in agentic RL the environment is typically affected by a complete response, rather than an individual token, our analysis lifts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.