False Correlation Reduction for Offline Reinforcement Learning
Zhihong Deng, Zuyue Fu, Lingxiao Wang, Zhuoran Yang, Chenjia Bai,, Tianyi Zhou, Zhaoran Wang, Jing Jiang

TL;DR
This paper introduces SCORE, a novel offline RL algorithm that reduces false correlations between uncertainty and decision-making, achieving state-of-the-art results and proven convergence.
Contribution
SCORE is a new algorithm that effectively reduces false correlations in offline RL, with theoretical guarantees and superior empirical performance.
Findings
Achieves 3.1x acceleration and state-of-the-art performance on D4RL benchmarks.
Introduces an annealing behavior cloning regularizer for better uncertainty estimation.
Proves convergence to the optimal policy under mild assumptions.
Abstract
Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems. Most existing papers only discuss defending against out-of-distribution (OOD) actions while we investigate a broader issue, the false correlations between epistemic uncertainty and decision-making, an essential factor that causes suboptimality. In this paper, we propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL). The proposed algorithm introduces an annealing behavior cloning regularizer to help produce a high-quality estimation of uncertainty which is critical for eliminating false correlations from suboptimality. Theoretically, we justify the rationality of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Advanced Bandit Algorithms Research
