Balancing Value Underestimation and Overestimation with Realistic   Actor-Critic

Sicen Li; Qinyun Tang; Yiming Pang; Xinmeng Ma; Gang Wang

arXiv:2110.09712·cs.LG·October 27, 2022·1 cites

Balancing Value Underestimation and Overestimation with Realistic Actor-Critic

Sicen Li, Qinyun Tang, Yiming Pang, Xinmeng Ma, Gang Wang

PDF

Open Access 1 Repo

TL;DR

This paper presents Realistic Actor-Critic (RAC), a model-free RL algorithm that improves sample efficiency by balancing value underestimation and overestimation using uncertainty-aware critics, leading to significant performance gains.

Contribution

RAC introduces a novel approach combining UVFA and uncertainty punished Q-learning to enhance sample efficiency in off-policy RL algorithms.

Findings

01

Achieves 10x sample efficiency on MuJoCo benchmarks.

02

Improves performance by 25% on Humanoid environment.

03

Successfully balances value estimation trade-offs in continuous control tasks.

Abstract

Model-free deep reinforcement learning (RL) has been successfully applied to challenging continuous control domains. However, poor sample efficiency prevents these methods from being widely used in real-world domains. This paper introduces a novel model-free algorithm, Realistic Actor-Critic(RAC), which can be incorporated with any off-policy RL algorithms to improve sample efficiency. RAC employs Universal Value Function Approximators (UVFA) to simultaneously learn a policy family with the same neural network, each with different trade-offs between underestimation and overestimation. To learn such policies, we introduce uncertainty punished Q-learning, which uses uncertainty from the ensembling of multiple critics to build various confidence-bounds of Q-function. We evaluate RAC on the MuJoCo benchmark, achieving 10x sample efficiency and 25\% performance improvement on the most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ihuhuhu/RAC
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning

MethodsConvolution · Average Pooling · Global Average Pooling · Dilated Convolution · 1x1 Convolution · Switchable Atrous Convolution