Never Explore Repeatedly in Multi-Agent Reinforcement Learning
Chenghao Li, Tonghan Wang, Chongjie Zhang, Qianchuan Zhao

TL;DR
This paper addresses the revisitation problem in multi-agent reinforcement learning caused by limited neural approximator expressiveness, proposing a dynamic reward scaling method to enhance exploration and performance in complex environments.
Contribution
It introduces a novel dynamic reward scaling technique to mitigate revisitation issues caused by neural approximator limitations in multi-agent RL.
Findings
Improved exploration in Google Research Football and StarCraft II tasks.
Enhanced performance in sparse reward environments.
Effective stabilization of intrinsic reward fluctuations.
Abstract
In the realm of multi-agent reinforcement learning, intrinsic motivations have emerged as a pivotal tool for exploration. While the computation of many intrinsic rewards relies on estimating variational posteriors using neural network approximators, a notable challenge has surfaced due to the limited expressive capability of these neural statistics approximators. We pinpoint this challenge as the "revisitation" issue, where agents recurrently explore confined areas of the task space. To combat this, we propose a dynamic reward scaling approach. This method is crafted to stabilize the significant fluctuations in intrinsic rewards in previously explored areas and promote broader exploration, effectively curbing the revisitation phenomenon. Our experimental findings underscore the efficacy of our approach, showcasing enhanced performance in demanding environments like Google Research…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Sports Analytics and Performance
