Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Nicolai Dorka, Tim Welschehold, Joschka Boedecker, Wolfram Burgard

TL;DR
This paper introduces Adaptively Calibrated Critics (ACC), a method that dynamically adjusts value estimates in deep reinforcement learning to reduce bias, leading to improved performance across various benchmarks without extensive hyperparameter tuning.
Contribution
The paper proposes a novel ACC method that adaptively calibrates critic estimates using recent on-policy rollouts, eliminating the need for hyperparameter search and achieving state-of-the-art results.
Findings
ACC sets a new state of the art on OpenAI gym benchmarks.
ACC improves performance on Meta-World robot tasks.
Applying ACC to TD3 enhances its effectiveness.
Abstract
Accurate value estimates are important for off-policy reinforcement learning. Algorithms based on temporal difference learning typically are prone to an over- or underestimation bias building up over time. In this paper, we propose a general method called Adaptively Calibrated Critics (ACC) that uses the most recent high variance but unbiased on-policy rollouts to alleviate the bias of the low variance temporal difference targets. We apply ACC to Truncated Quantile Critics, which is an algorithm for continuous control that allows regulation of the bias with a hyperparameter tuned per environment. The resulting algorithm adaptively adjusts the parameter during training rendering hyperparameter search unnecessary and sets a new state of the art on the OpenAI gym continuous control benchmark among all algorithms that do not tune hyperparameters for each environment. ACC further achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsDense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Clipped Double Q-learning · Adam · Target Policy Smoothing · Experience Replay · Twin Delayed Deep Deterministic
