Controlling Overestimation Bias with Truncated Mixture of Continuous   Distributional Quantile Critics

Arsenii Kuznetsov; Pavel Shvechikov; Alexander Grishin; Dmitry Vetrov

arXiv:2005.04269·cs.LG·May 12, 2020·53 cites

Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, Dmitry Vetrov

PDF

Open Access 5 Repos 1 Video

TL;DR

This paper introduces TQC, a novel method combining distributional critics, truncation, and ensembling to effectively reduce overestimation bias in continuous control, leading to significant performance improvements.

Contribution

TQC is the first approach to integrate distributional critics, truncation, and ensembling specifically for controlling overestimation bias in continuous control tasks.

Findings

01

TQC outperforms existing methods on all benchmark environments.

02

Achieves 25% improvement on the Humanoid environment.

03

Effectively reduces overestimation bias in off-policy learning.

Abstract

The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional representation and truncation allow for arbitrary granular overestimation control, while ensembling provides additional score improvements. TQC outperforms the current state of the art on all environments from the continuous control benchmark suite, demonstrating 25% improvement on the most challenging Humanoid environment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics· slideslive

Taxonomy

TopicsModel Reduction and Neural Networks · Adversarial Robustness in Machine Learning · Reinforcement Learning in Robotics