Reducing Overestimation Bias in Multi-Agent Domains Using Double   Centralized Critics

Johannes Ackermann; Volker Gabler; Takayuki Osa; Masashi Sugiyama

arXiv:1910.01465·cs.LG·December 3, 2019·70 cites

Reducing Overestimation Bias in Multi-Agent Domains Using Double Centralized Critics

Johannes Ackermann, Volker Gabler, Takayuki Osa, Masashi Sugiyama

PDF

Open Access 3 Repos

TL;DR

This paper introduces a method using double centralized critics to reduce overestimation bias in multi-agent reinforcement learning, improving policy learning efficiency in cooperative-competitive and robotic tasks.

Contribution

It presents a novel approach that effectively reduces value overestimation bias in multi-agent RL using double critics, enhancing learning performance.

Findings

01

Significant performance improvement over existing methods.

02

Effective in high-dimensional robotic tasks.

03

Enables learning decentralized policies.

Abstract

Many real world tasks require multiple agents to work together. Multi-agent reinforcement learning (RL) methods have been proposed in recent years to solve these tasks, but current methods often fail to efficiently learn policies. We thus investigate the presence of a common weakness in single-agent RL, namely value function overestimation bias, in the multi-agent setting. Based on our findings, we propose an approach that reduces this bias by using double centralized critics. We evaluate it on six mixed cooperative-competitive tasks, showing a significant advantage over current methods. Finally, we investigate the application of multi-agent methods to high-dimensional robotic tasks and show that our approach can be used to learn decentralized policies in this domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Experimental Behavioral Economics Studies · Adaptive Dynamic Programming Control