False Correlation Reduction for Offline Reinforcement Learning

Zhihong Deng; Zuyue Fu; Lingxiao Wang; Zhuoran Yang; Chenjia Bai,; Tianyi Zhou; Zhaoran Wang; Jing Jiang

arXiv:2110.12468·cs.LG·November 2, 2023

False Correlation Reduction for Offline Reinforcement Learning

Zhihong Deng, Zuyue Fu, Lingxiao Wang, Zhuoran Yang, Chenjia Bai,, Tianyi Zhou, Zhaoran Wang, Jing Jiang

PDF

Open Access 1 Repo

TL;DR

This paper introduces SCORE, a novel offline RL algorithm that reduces false correlations between uncertainty and decision-making, achieving state-of-the-art results and proven convergence.

Contribution

SCORE is a new algorithm that effectively reduces false correlations in offline RL, with theoretical guarantees and superior empirical performance.

Findings

01

Achieves 3.1x acceleration and state-of-the-art performance on D4RL benchmarks.

02

Introduces an annealing behavior cloning regularizer for better uncertainty estimation.

03

Proves convergence to the optimal policy under mild assumptions.

Abstract

Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems. Most existing papers only discuss defending against out-of-distribution (OOD) actions while we investigate a broader issue, the false correlations between epistemic uncertainty and decision-making, an essential factor that causes suboptimality. In this paper, we propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL). The proposed algorithm introduces an annealing behavior cloning regularizer to help produce a high-quality estimation of uncertainty which is critical for eliminating false correlations from suboptimality. Theoretically, we justify the rationality of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yifan123/arxiv_spider
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research