Prioritized Guidance for Efficient Multi-Agent Reinforcement Learning   Exploration

Qisheng Wang; Qichao Wang

arXiv:1907.07847·cs.LG·December 30, 2019·1 cites

Prioritized Guidance for Efficient Multi-Agent Reinforcement Learning Exploration

Qisheng Wang, Qichao Wang

PDF

Open Access

TL;DR

This paper introduces a novel communication-guided approach with predictive reward modeling and prioritized experience replay to enhance exploration efficiency in multi-agent reinforcement learning, demonstrating superior performance in cooperative tasks.

Contribution

It proposes a new communication method, a predictive reward network, and an improved prioritized experience replay to accelerate learning in MARL, with potential extension to supervised learning.

Findings

01

Outperforms existing MARL methods in cooperative environments

02

Enhances exploration efficiency through guided communication and reward prediction

03

Utilizes improved prioritized experience replay for better knowledge sharing

Abstract

Exploration efficiency is a challenging problem in multi-agent reinforcement learning (MARL), as the policy learned by confederate MARL depends on the collaborative approach among multiple agents. Another important problem is the less informative reward restricts the learning speed of MARL compared with the informative label in supervised learning. In this work, we leverage on a novel communication method to guide MARL to accelerate exploration and propose a predictive network to forecast the reward of current state-action pair and use the guidance learned by the predictive network to modify the reward function. An improved prioritized experience replay is employed to better take advantage of the different knowledge learned by different agents which utilizes Time-difference (TD) error more effectively. Experimental results demonstrates that the proposed algorithm outperforms existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Bandit Algorithms Research

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Prioritized Experience Replay · Experience Replay