B2MAPO: A Batch-by-Batch Multi-Agent Policy Optimization to Balance   Performance and Efficiency

Wenjing Zhang; Wei Zhang; Wenqing Hu; and Yifan Wang

arXiv:2407.15077·cs.MA·July 30, 2024

B2MAPO: A Batch-by-Batch Multi-Agent Policy Optimization to Balance Performance and Efficiency

Wenjing Zhang, Wei Zhang, Wenqing Hu, and Yifan Wang

PDF

Open Access

TL;DR

B2MAPO introduces a batch-by-batch multi-agent policy optimization method that balances performance and efficiency by partitioning policies and updating them in batches, with theoretical guarantees and practical advantages demonstrated on complex benchmarks.

Contribution

The paper proposes B2MAPO, a novel batch-wise policy optimization framework with a hierarchical structure and DAG implementation, improving training efficiency while maintaining high performance.

Findings

01

Outperforms baseline methods on StarCraftII and Google Football benchmarks.

02

Reduces training time by 60.4% and execution time by 78.7% compared to A2PO.

03

Provides theoretical guarantees for monotonic policy improvement.

Abstract

Most multi-agent reinforcement learning approaches adopt two types of policy optimization methods that either update policy simultaneously or sequentially. Simultaneously updating policies of all agents introduces non-stationarity problem. Although sequentially updating policies agent-by-agent in an appropriate order improves policy performance, it is prone to low efficiency due to sequential execution, resulting in longer model training and execution time. Intuitively, partitioning policies of all agents according to their interdependence and updating joint policy batch-by-batch can effectively balance performance and efficiency. However, how to determine the optimal batch partition of policies and batch updating order are challenging problems. Firstly, a sequential batched policy updating scheme, B2MAPO (Batch by Batch Multi-Agent Policy Optimization), is proposed with a theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications