Predictive Auxiliary Learning for Belief-based Multi-Agent Systems
Qinwei Huang, Stefan Wang, Simon Khan, Garrett Katz, Qinru Qiu

TL;DR
This paper introduces BEPAL, a framework that uses auxiliary predictive tasks in multi-agent reinforcement learning to improve training stability and performance in partially observable environments.
Contribution
The paper proposes BEPAL, a novel belief-based auxiliary learning framework that enhances MARL by integrating predictive tasks for unobservable information, leading to better stability and performance.
Findings
BEPAL achieves about 16% performance improvement.
BEPAL demonstrates more stable convergence.
Effective in predator-prey and football environments.
Abstract
The performance of multi-agent reinforcement learning (MARL) in partially observable environments depends on effectively aggregating information from observations, communications, and reward signals. While most existing multi-agent systems primarily rely on rewards as the only feedback for policy training, our research shows that introducing auxiliary predictive tasks can significantly enhance learning efficiency and stability. We propose Belief-based Predictive Auxiliary Learning (BEPAL), a framework that incorporates auxiliary training objectives to support policy optimization. BEPAL follows the centralized training with decentralized execution paradigm. Each agent learns a belief model that predicts unobservable state information, such as other agents' rewards or motion directions, alongside its policy model. By enriching hidden state representations with information that does not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning
