Multi-agent Policy Optimization with Approximatively Synchronous   Advantage Estimation

Lipeng Wan; Xuwei Song; Xuguang Lan; Nanning Zheng

arXiv:2012.03488·cs.LG·May 11, 2021·1 cites

Multi-agent Policy Optimization with Approximatively Synchronous Advantage Estimation

Lipeng Wan, Xuwei Song, Xuguang Lan, Nanning Zheng

PDF

Open Access

TL;DR

This paper introduces an approximatively synchronous advantage estimation method for multi-agent reinforcement learning, reducing bias in policy evaluation and improving performance on complex cooperative tasks.

Contribution

It proposes a novel marginal advantage function and a policy approximation technique to enable synchronous policy evaluation in multi-agent systems.

Findings

01

Achieves superior performance on StarCraft multi-agent challenges

02

Reduces estimation bias in multi-agent advantage functions

03

Breaks down multi-agent optimization into single-agent sub-problems

Abstract

Cooperative multi-agent tasks require agents to deduce their own contributions with shared global rewards, known as the challenge of credit assignment. General methods for policy based multi-agent reinforcement learning to solve the challenge introduce differentiate value functions or advantage functions for individual agents. In multi-agent system, polices of different agents need to be evaluated jointly. In order to update polices synchronously, such value functions or advantage functions also need synchronous evaluation. However, in current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously, thus suffer from natural estimation bias. In this work, we propose the approximatively synchronous advantage estimation. We first derive the marginal advantage function, an expansion from single-agent advantage function to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Transportation and Mobility Innovations · Scheduling and Optimization Algorithms