Decentralized Graph-Based Multi-Agent Reinforcement Learning Using Reward Machines
Jueming Hu, Zhe Xu, Weichang Wang, Guannan Qu, Yutian Pang, and, Yongming Liu

TL;DR
This paper introduces a decentralized graph-based reinforcement learning method using reward machines for multi-agent systems, enabling agents to learn complex tasks efficiently with local information, demonstrated through UAV delivery and pandemic mitigation case studies.
Contribution
It proposes the DGRM algorithm that combines reward machines with decentralized actor-critic methods, reducing complexity and enabling scalable multi-agent learning.
Findings
DGRM achieves significant reward improvements, up to 119%, in case studies.
Local information suffices for agents to learn complex tasks effectively.
Q-function dependency on other agents decreases exponentially with distance.
Abstract
In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP) where the dynamics of neighboring agents are coupled. We use a reward machine (RM) to encode each agent's task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently, based on the information available to the agents. DGRM uses the actor-critic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
