Automating Control of Overestimation Bias for Reinforcement Learning

Arsenii Kuznetsov; Alexander Grishin; Artem Tsypin; Arsenii Ashukha,; Artur Kadurin; Dmitry Vetrov

arXiv:2110.13523·cs.LG·February 1, 2022

Automating Control of Overestimation Bias for Reinforcement Learning

Arsenii Kuznetsov, Alexander Grishin, Artem Tsypin, Arsenii Ashukha,, Artur Kadurin, Dmitry Vetrov

PDF

Open Access

TL;DR

This paper introduces a data-driven method for automatically tuning bias control hyperparameters in reinforcement learning, reducing the need for manual tuning and improving sample efficiency across multiple algorithms.

Contribution

It presents a general approach for automatic hyperparameter selection in overestimation bias control, applicable to various RL algorithms, eliminating extensive hyperparameter searches.

Findings

01

Reduces hyperparameter tuning effort

02

Maintains performance while decreasing interactions

03

Effective across multiple RL algorithms

Abstract

Overestimation bias control techniques are used by the majority of high-performing off-policy reinforcement learning algorithms. However, most of these techniques rely on pre-defined bias correction policies that are either not flexible enough or require environment-specific tuning of hyperparameters. In this work, we present a general data-driven approach for the automatic selection of bias control hyperparameters. We demonstrate its effectiveness on three algorithms: Truncated Quantile Critics, Weighted Delayed DDPG, and Maxmin Q-learning. The proposed technique eliminates the need for an extensive hyperparameter search. We show that it leads to a significant reduction of the actual number of interactions while preserving the performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsBatch Normalization · Convolution · Weight Decay · Dense Connections · Experience Replay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Deep Deterministic Policy Gradient