Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts
Weinan Zhang, Xihuai Wang, Jian Shen, Ming Zhou

TL;DR
This paper introduces AORPO, a decentralized model-based multi-agent reinforcement learning method that adaptively models opponents to improve sample efficiency and convergence in both cooperative and competitive tasks.
Contribution
The paper proposes AORPO, a novel decentralized approach with adaptive opponent modeling, and provides theoretical analysis and empirical validation of its improved sample efficiency.
Findings
AORPO achieves higher sample efficiency than existing MARL methods.
Theoretical convergence of AORPO is established under reasonable assumptions.
Empirical results show comparable asymptotic performance with improved learning speed.
Abstract
This paper investigates the model-based methods in multi-agent reinforcement learning (MARL). We specify the dynamics sample complexity and the opponent sample complexity in MARL, and conduct a theoretic analysis of return discrepancy upper bound. To reduce the upper bound with the intention of low sample complexity during the whole learning process, we propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy Optimization (AORPO). In AORPO, each agent builds its multi-agent environment model, consisting of a dynamics model and multiple opponent models, and trains its policy with the adaptive opponent-wise rollout. We further prove the theoretic convergence of AORPO under reasonable assumptions. Empirical experiments on competitive and cooperative tasks demonstrate that AORPO can achieve improved sample efficiency with comparable asymptotic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Experimental Behavioral Economics Studies · Adaptive Dynamic Programming Control
