Automating Control of Overestimation Bias for Reinforcement Learning
Arsenii Kuznetsov, Alexander Grishin, Artem Tsypin, Arsenii Ashukha,, Artur Kadurin, Dmitry Vetrov

TL;DR
This paper introduces a data-driven method for automatically tuning bias control hyperparameters in reinforcement learning, reducing the need for manual tuning and improving sample efficiency across multiple algorithms.
Contribution
It presents a general approach for automatic hyperparameter selection in overestimation bias control, applicable to various RL algorithms, eliminating extensive hyperparameter searches.
Findings
Reduces hyperparameter tuning effort
Maintains performance while decreasing interactions
Effective across multiple RL algorithms
Abstract
Overestimation bias control techniques are used by the majority of high-performing off-policy reinforcement learning algorithms. However, most of these techniques rely on pre-defined bias correction policies that are either not flexible enough or require environment-specific tuning of hyperparameters. In this work, we present a general data-driven approach for the automatic selection of bias control hyperparameters. We demonstrate its effectiveness on three algorithms: Truncated Quantile Critics, Weighted Delayed DDPG, and Maxmin Q-learning. The proposed technique eliminates the need for an extensive hyperparameter search. We show that it leads to a significant reduction of the actual number of interactions while preserving the performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsBatch Normalization · Convolution · Weight Decay · Dense Connections · Experience Replay · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Deep Deterministic Policy Gradient
