Multi-agent Policy Optimization with Approximatively Synchronous Advantage Estimation
Lipeng Wan, Xuwei Song, Xuguang Lan, Nanning Zheng

TL;DR
This paper introduces an approximatively synchronous advantage estimation method for multi-agent reinforcement learning, reducing bias in policy evaluation and improving performance on complex cooperative tasks.
Contribution
It proposes a novel marginal advantage function and a policy approximation technique to enable synchronous policy evaluation in multi-agent systems.
Findings
Achieves superior performance on StarCraft multi-agent challenges
Reduces estimation bias in multi-agent advantage functions
Breaks down multi-agent optimization into single-agent sub-problems
Abstract
Cooperative multi-agent tasks require agents to deduce their own contributions with shared global rewards, known as the challenge of credit assignment. General methods for policy based multi-agent reinforcement learning to solve the challenge introduce differentiate value functions or advantage functions for individual agents. In multi-agent system, polices of different agents need to be evaluated jointly. In order to update polices synchronously, such value functions or advantage functions also need synchronous evaluation. However, in current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously, thus suffer from natural estimation bias. In this work, we propose the approximatively synchronous advantage estimation. We first derive the marginal advantage function, an expansion from single-agent advantage function to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Transportation and Mobility Innovations · Scheduling and Optimization Algorithms
