Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent   Reinforcement Learning

Xutong Zhao; Yangchen Pan; Chenjun Xiao; Sarath Chandar; Janarthanan; Rajendran

arXiv:2303.09032·cs.LG·July 17, 2023·1 cites

Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent Reinforcement Learning

Xutong Zhao, Yangchen Pan, Chenjun Xiao, Sarath Chandar, Janarthanan, Rajendran

PDF

Open Access 1 Repo

TL;DR

This paper introduces Conditionally Optimistic Exploration (COE), a novel exploration method for cooperative multi-agent reinforcement learning that leverages sequential action computation and tree search principles to improve exploration efficiency.

Contribution

The paper proposes COE, a new exploration approach inspired by UCT, which enhances cooperative exploration in MARL by incorporating optimistic bonuses based on sequential agent actions.

Findings

01

COE outperforms existing exploration methods on challenging MARL benchmarks.

02

COE is compatible with various value decomposition methods for centralized training.

03

The method effectively captures structured dependencies among agents during exploration.

Abstract

Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL). In this work, we propose an exploration method that effectively encourages cooperative exploration based on the idea of sequential action-computation scheme. The high-level intuition is that to perform optimism-based exploration, agents would explore cooperative strategies if each agent's optimism estimate captures a structured dependency relationship with other agents. Assuming agents compute actions following a sequential order at \textit{each environment timestep}, we provide a perspective to view MARL as tree search iterations by considering agents as nodes at different depths of the search tree. Inspired by the theoretically justified tree search algorithm UCT (Upper Confidence bounds applied to Trees), we develop a method called Conditionally Optimistic Exploration (COE). COE augments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chandar-lab/coe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning