TL;DR
This paper introduces an optimistic exploration strategy for cooperative multi-agent reinforcement learning, addressing value underestimation and improving convergence to optimal solutions.
Contribution
It proposes a novel optimistic $oldsymbol{ ext{ extepsilon}}$-greedy exploration method with theoretical convergence guarantees, enhancing performance over existing algorithms.
Findings
Prevents algorithms from converging to suboptimal solutions.
Significantly improves final returns, win rates, and convergence speeds.
Effective in various environments with cooperative multi-agent tasks.
Abstract
The Centralized Training with Decentralized Execution (CTDE) paradigm is widely used in cooperative multi-agent reinforcement learning. However, conventional methods based on CTDE can suffer from value underestimation and converge to suboptimal solutions. While such underestimation is typically attributed to the representational limitations of monotonic structures, we provide a novel perspective by demonstrating that the insufficient sampling of optimal joint actions during exploration is also a critical factor. To address this problem, we propose Optimistic -Greedy Exploration. Our method introduces optimistic action-value networks that serve as decoupled exploration indicators, which we theoretically prove to converge in probability to the maximum achievable returns. By sampling actions from these distributions with a probability of , we effectively increase the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
