Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Nicolai Dorka; Tim Welschehold; Joschka Boedecker; Wolfram Burgard

arXiv:2111.12673·cs.LG·October 24, 2022

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Nicolai Dorka, Tim Welschehold, Joschka Boedecker, Wolfram Burgard

PDF

Open Access 1 Repo

TL;DR

This paper introduces Adaptively Calibrated Critics (ACC), a method that dynamically adjusts value estimates in deep reinforcement learning to reduce bias, leading to improved performance across various benchmarks without extensive hyperparameter tuning.

Contribution

The paper proposes a novel ACC method that adaptively calibrates critic estimates using recent on-policy rollouts, eliminating the need for hyperparameter search and achieving state-of-the-art results.

Findings

01

ACC sets a new state of the art on OpenAI gym benchmarks.

02

ACC improves performance on Meta-World robot tasks.

03

Applying ACC to TD3 enhances its effectiveness.

Abstract

Accurate value estimates are important for off-policy reinforcement learning. Algorithms based on temporal difference learning typically are prone to an over- or underestimation bias building up over time. In this paper, we propose a general method called Adaptively Calibrated Critics (ACC) that uses the most recent high variance but unbiased on-policy rollouts to alleviate the bias of the low variance temporal difference targets. We apply ACC to Truncated Quantile Critics, which is an algorithm for continuous control that allows regulation of the bias with a hyperparameter tuned per environment. The resulting algorithm adaptively adjusts the parameter during training rendering hyperparameter search unnecessary and sets a new state of the art on the OpenAI gym continuous control benchmark among all algorithms that do not tune hyperparameters for each environment. ACC further achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nicolinho/acc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsDense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Clipped Double Q-learning · Adam · Target Policy Smoothing · Experience Replay · Twin Delayed Deep Deterministic