Macro-Action-Based Multi-Agent/Robot Deep Reinforcement Learning under Partial Observability
Yuchen Xiao

TL;DR
This paper introduces macro-action-based deep reinforcement learning methods for multi-agent systems operating under partial observability, enabling asynchronous decision-making and improving scalability in complex real-world tasks.
Contribution
It develops value-based and policy gradient RL algorithms for MacDec-POMDPs, allowing asynchronous macro-action decision-making in multi-agent reinforcement learning.
Findings
Algorithms outperform existing methods in large multi-agent problems
Effective in both simulation and real robot experiments
Demonstrates scalability and high-quality solutions with macro-actions
Abstract
The state-of-the-art multi-agent reinforcement learning (MARL) methods have provided promising solutions to a variety of complex problems. Yet, these methods all assume that agents perform synchronized primitive-action executions so that they are not genuinely scalable to long-horizon real-world multi-agent/robot tasks that inherently require agents/robots to asynchronously reason about high-level action selection at varying time durations. The Macro-Action Decentralized Partially Observable Markov Decision Process (MacDec-POMDP) is a general formalization for asynchronous decision-making under uncertainty in fully cooperative multi-agent tasks. In this thesis, we first propose a group of value-based RL approaches for MacDec-POMDPs, where agents are allowed to perform asynchronous learning and decision-making with macro-action-value functions in three paradigms: decentralized learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
