Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise   Rollouts

Weinan Zhang; Xihuai Wang; Jian Shen; Ming Zhou

arXiv:2105.03363·cs.LG·March 18, 2022·1 cites

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Weinan Zhang, Xihuai Wang, Jian Shen, Ming Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces AORPO, a decentralized model-based multi-agent reinforcement learning method that adaptively models opponents to improve sample efficiency and convergence in both cooperative and competitive tasks.

Contribution

The paper proposes AORPO, a novel decentralized approach with adaptive opponent modeling, and provides theoretical analysis and empirical validation of its improved sample efficiency.

Findings

01

AORPO achieves higher sample efficiency than existing MARL methods.

02

Theoretical convergence of AORPO is established under reasonable assumptions.

03

Empirical results show comparable asymptotic performance with improved learning speed.

Abstract

This paper investigates the model-based methods in multi-agent reinforcement learning (MARL). We specify the dynamics sample complexity and the opponent sample complexity in MARL, and conduct a theoretic analysis of return discrepancy upper bound. To reduce the upper bound with the intention of low sample complexity during the whole learning process, we propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy Optimization (AORPO). In AORPO, each agent builds its multi-agent environment model, consisting of a dynamics model and multiple opponent models, and trains its policy with the adaptive opponent-wise rollout. We further prove the theoretic convergence of AORPO under reasonable assumptions. Empirical experiments on competitive and cooperative tasks demonstrate that AORPO can achieve improved sample efficiency with comparable asymptotic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

apexrl/AORPO
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Experimental Behavioral Economics Studies · Adaptive Dynamic Programming Control