Offline Policy Optimization in RL with Variance Regularizaton

Riashat Islam; Samarth Sinha; Homanga Bharadhwaj; Samin Yeasar Arnob,; Zhuoran Yang; Animesh Garg; Zhaoran Wang; Lihong Li; Doina Precup

arXiv:2212.14405·cs.LG·January 2, 2023

Offline Policy Optimization in RL with Variance Regularizaton

Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob,, Zhuoran Yang, Animesh Garg, Zhaoran Wang, Lihong Li, Doina Precup

PDF

Open Access

TL;DR

This paper introduces a variance regularization technique for offline RL that reduces over-estimation and distributional shift issues, improving policy learning stability and performance across continuous control tasks.

Contribution

The authors propose a novel variance regularizer using Fenchel duality for offline RL, compatible with existing algorithms, and demonstrate its effectiveness in reducing over-estimation errors.

Findings

01

Lower bound to offline policy optimization objective

02

Improved performance over state-of-the-art algorithms

03

Effective in continuous control domains

Abstract

Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications. This is often because off-policy RL algorithms suffer from distributional shift, due to mismatch between dataset and the target policy, leading to high variance and over-estimation of value functions. In this work, we propose variance regularization for offline RL algorithms, using stationary distribution corrections. We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer. The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms. We show that the regularizer leads to a lower bound to the offline policy optimization objective, which can help avoid over-estimation errors, and explains…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Advanced Bandit Algorithms Research