XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity
Jianye Xu, Omar Sobhy, Bassam Alrifaee

TL;DR
XP-MARL introduces an auxiliary prioritization framework in multi-agent reinforcement learning that stabilizes learning by learning priority policies and propagating actions, significantly improving safety in cooperative scenarios.
Contribution
The paper proposes XP-MARL, a novel framework that learns priority assignments and uses action propagation to address non-stationarity in cooperative MARL environments.
Findings
XP-MARL improves safety by 84.4% in CAV scenarios.
It outperforms state-of-the-art methods which improve baseline by 12.8%.
The approach stabilizes learning in multi-agent systems.
Abstract
Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement Learning (MARL), arising from agents simultaneously learning and altering their policies. This creates a non-stationary environment from the perspective of each individual agent, often leading to suboptimal or even unconverged learning outcomes. We propose an open-source framework named XP-MARL, which augments MARL with auxiliary prioritization to address this challenge in cooperative settings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents and letting higher-priority agents establish their actions first would stabilize the learning process and thus mitigate non-stationarity and 2) enabled by our proposed mechanism called action propagation, where higher-priority agents act first and communicate their actions, providing a more stationary environment for others. Moreover, instead of using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSupply Chain and Inventory Management
