Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning
Xutong Zhao, Yangchen Pan, Chenjun Xiao, Sarath Chandar, Janarthanan, Rajendran

TL;DR
This paper introduces Conditionally Optimistic Exploration (COE), a novel exploration method for cooperative multi-agent reinforcement learning that leverages sequential action computation and tree search principles to improve exploration efficiency.
Contribution
The paper proposes COE, a new exploration approach inspired by UCT, which enhances cooperative exploration in MARL by incorporating optimistic bonuses based on sequential agent actions.
Findings
COE outperforms existing exploration methods on challenging MARL benchmarks.
COE is compatible with various value decomposition methods for centralized training.
The method effectively captures structured dependencies among agents during exploration.
Abstract
Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL). In this work, we propose an exploration method that effectively encourages cooperative exploration based on the idea of sequential action-computation scheme. The high-level intuition is that to perform optimism-based exploration, agents would explore cooperative strategies if each agent's optimism estimate captures a structured dependency relationship with other agents. Assuming agents compute actions following a sequential order at \textit{each environment timestep}, we provide a perspective to view MARL as tree search iterations by considering agents as nodes at different depths of the search tree. Inspired by the theoretically justified tree search algorithm UCT (Upper Confidence bounds applied to Trees), we develop a method called Conditionally Optimistic Exploration (COE). COE augments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
