Optimistic {\epsilon}-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning

Ruoning Zhang; Siying Wang; Wenyu Chen; Yang Zhou; Zhitong Zhao; Zixuan Zhang; Ruijie Zhang; Stefano V. Albrecht

arXiv:2502.03506·cs.MA·May 5, 2026

Optimistic {\epsilon}-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning

Ruoning Zhang, Siying Wang, Wenyu Chen, Yang Zhou, Zhitong Zhao, Zixuan Zhang, Ruijie Zhang, Stefano V. Albrecht

PDF

1 Repo

TL;DR

This paper introduces an optimistic exploration strategy for cooperative multi-agent reinforcement learning, addressing value underestimation and improving convergence to optimal solutions.

Contribution

It proposes a novel optimistic $oldsymbol{ ext{ extepsilon}}$-greedy exploration method with theoretical convergence guarantees, enhancing performance over existing algorithms.

Findings

01

Prevents algorithms from converging to suboptimal solutions.

02

Significantly improves final returns, win rates, and convergence speeds.

03

Effective in various environments with cooperative multi-agent tasks.

Abstract

The Centralized Training with Decentralized Execution (CTDE) paradigm is widely used in cooperative multi-agent reinforcement learning. However, conventional methods based on CTDE can suffer from value underestimation and converge to suboptimal solutions. While such underestimation is typically attributed to the representational limitations of monotonic structures, we provide a novel perspective by demonstrating that the insufficient sampling of optimal joint actions during exploration is also a critical factor. To address this problem, we propose Optimistic $ϵ$ -Greedy Exploration. Our method introduces optimistic action-value networks that serve as decoupled exploration indicators, which we theoretically prove to converge in probability to the maximum achievable returns. By sampling actions from these distributions with a probability of $ϵ$ , we effectively increase the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qxqxtxdy/OptimisticExploration
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.