Offline Reinforcement Learning with Closed-Form Policy Improvement   Operators

Jiachen Li; Edwin Zhang; Ming Yin; Qinxun Bai; Yu-Xiang Wang; William; Yang Wang

arXiv:2211.15956·cs.LG·July 25, 2023·1 cites

Offline Reinforcement Learning with Closed-Form Policy Improvement Operators

Jiachen Li, Edwin Zhang, Ming Yin, Qinxun Bai, Yu-Xiang Wang, William, Yang Wang

PDF

Open Access

TL;DR

This paper introduces closed-form policy improvement operators for offline reinforcement learning, leveraging behavior constraints and Gaussian mixture models to enhance policy optimization, demonstrated through superior performance on standard benchmarks.

Contribution

The paper proposes novel closed-form policy improvement operators using first-order Taylor approximation and Gaussian mixture models for behavior policies in offline RL.

Findings

01

Outperforms state-of-the-art algorithms on D4RL benchmarks

02

Introduces a linear approximation of the policy objective

03

Provides a closed-form solution for policy improvement

Abstract

Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a policy is trained to maximize a learned value function while constrained by the behavior policy to avoid a significant distributional shift. In this paper, we propose our closed-form policy improvement operators. We make a novel observation that the behavior constraint naturally motivates the use of first-order Taylor approximation, leading to a linear approximation of the policy objective. Additionally, as practical datasets are usually collected by heterogeneous policies, we model the behavior policies as a Gaussian Mixture and overcome the induced optimization difficulties by leveraging the LogSumExp's lower bound and Jensen's Inequality, giving rise to a closed-form policy improvement operator. We instantiate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Machine Learning and ELM