TL;DR
This paper introduces an algorithm that extends policy optimization methods to asynchronous multi-agent decision problems with macro-actions, enabling effective learning in event-driven, stochastic environments.
Contribution
It presents a novel approach that modifies generalized advantage estimation for macro-actions, allowing optimization in asynchronous, event-driven multi-agent systems.
Findings
Successfully learned optimal policies in real-time bus control and wildfire fighting domains.
Demonstrated advantages of event-driven simulation over fixed time-step methods.
Showed scalability issues with fixed time-step simulation as the number of agents increases.
Abstract
The incorporation of macro-actions (temporally extended actions) into multi-agent decision problems has the potential to address the curse of dimensionality associated with such decision problems. Since macro-actions last for stochastic durations, multiple agents executing decentralized policies in cooperative environments must act asynchronously. We present an algorithm that modifies generalized advantage estimation for temporally extended actions, allowing a state-of-the-art policy optimization algorithm to optimize policies in Dec-POMDPs in which agents act asynchronously. We show that our algorithm is capable of learning optimal policies in two cooperative domains, one involving real-time bus holding control and one involving wildfire fighting with unmanned aircraft. Our algorithm works by framing problems as "event-driven decision processes," which are scenarios in which the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
