Prioritized Guidance for Efficient Multi-Agent Reinforcement Learning Exploration
Qisheng Wang, Qichao Wang

TL;DR
This paper introduces a novel communication-guided approach with predictive reward modeling and prioritized experience replay to enhance exploration efficiency in multi-agent reinforcement learning, demonstrating superior performance in cooperative tasks.
Contribution
It proposes a new communication method, a predictive reward network, and an improved prioritized experience replay to accelerate learning in MARL, with potential extension to supervised learning.
Findings
Outperforms existing MARL methods in cooperative environments
Enhances exploration efficiency through guided communication and reward prediction
Utilizes improved prioritized experience replay for better knowledge sharing
Abstract
Exploration efficiency is a challenging problem in multi-agent reinforcement learning (MARL), as the policy learned by confederate MARL depends on the collaborative approach among multiple agents. Another important problem is the less informative reward restricts the learning speed of MARL compared with the informative label in supervised learning. In this work, we leverage on a novel communication method to guide MARL to accelerate exploration and propose a predictive network to forecast the reward of current state-action pair and use the guidance learned by the predictive network to modify the reward function. An improved prioritized experience replay is employed to better take advantage of the different knowledge learned by different agents which utilizes Time-difference (TD) error more effectively. Experimental results demonstrates that the proposed algorithm outperforms existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Advanced Bandit Algorithms Research
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Prioritized Experience Replay · Experience Replay
