Decentralized Graph-Based Multi-Agent Reinforcement Learning Using   Reward Machines

Jueming Hu; Zhe Xu; Weichang Wang; Guannan Qu; Yutian Pang; and; Yongming Liu

arXiv:2110.00096·cs.MA·October 4, 2021

Decentralized Graph-Based Multi-Agent Reinforcement Learning Using Reward Machines

Jueming Hu, Zhe Xu, Weichang Wang, Guannan Qu, Yutian Pang, and, Yongming Liu

PDF

TL;DR

This paper introduces a decentralized graph-based reinforcement learning method using reward machines for multi-agent systems, enabling agents to learn complex tasks efficiently with local information, demonstrated through UAV delivery and pandemic mitigation case studies.

Contribution

It proposes the DGRM algorithm that combines reward machines with decentralized actor-critic methods, reducing complexity and enabling scalable multi-agent learning.

Findings

01

DGRM achieves significant reward improvements, up to 119%, in case studies.

02

Local information suffices for agents to learn complex tasks effectively.

03

Q-function dependency on other agents decreases exponentially with distance.

Abstract

In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP) where the dynamics of neighboring agents are coupled. We use a reward machine (RM) to encode each agent's task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently, based on the information available to the agents. DGRM uses the actor-critic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.