Breaking the Impasse: Dual-Scale Evolutionary Policy Training for Social Language Agents
Minzheng Wang,Run Luo,Yanbo Wang,Zichen Liu,Yuqiao Tan,Tao Tan,Xu Nan,Yinhe Zheng,Wenji Mao

TL;DR
This paper introduces DEPT, a novel method for social language agents that detects and overcomes evolution impasse, enabling continuous strategic development in open-ended language games.
Contribution
DEPT employs dual-scale perception and advantage reshaping to prevent policy collapse, facilitating ongoing evolution in social language agents.
Findings
DEPT outperforms baseline methods in social language games.
It effectively detects and mitigates evolution impasse.
DEPT maintains strategic diversity and continuous evolution.
Abstract
While Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for closed-ended tasks, extending it to open-ended social language games via self-play reveals a critical issue: evolution impasse. Due to the vast strategy space, language agents frequently converge to homogenized behaviors, leading to deterministic match outcomes that eliminate the gradient signals necessary for policy evolution. To tackle this issue, we propose Dual-scale Evolutionary Policy Training (DEPT) for social language games. DEPT introduces a time-scaled evolutionary perception mechanism that detects impasse by quantifying dual-scale value baseline divergence alongside match entropy. Upon perceiving the collapse, it then activates asymmetric advantage reshaping to dynamically modulate the optimization landscape for intervention. Thus, our method effectively restores gradient signals and enforces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
