Offline Reinforcement Learning with Closed-Form Policy Improvement Operators
Jiachen Li, Edwin Zhang, Ming Yin, Qinxun Bai, Yu-Xiang Wang, William, Yang Wang

TL;DR
This paper introduces closed-form policy improvement operators for offline reinforcement learning, leveraging behavior constraints and Gaussian mixture models to enhance policy optimization, demonstrated through superior performance on standard benchmarks.
Contribution
The paper proposes novel closed-form policy improvement operators using first-order Taylor approximation and Gaussian mixture models for behavior policies in offline RL.
Findings
Outperforms state-of-the-art algorithms on D4RL benchmarks
Introduces a linear approximation of the policy objective
Provides a closed-form solution for policy improvement
Abstract
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. By exploiting historical transitions, a policy is trained to maximize a learned value function while constrained by the behavior policy to avoid a significant distributional shift. In this paper, we propose our closed-form policy improvement operators. We make a novel observation that the behavior constraint naturally motivates the use of first-order Taylor approximation, leading to a linear approximation of the policy objective. Additionally, as practical datasets are usually collected by heterogeneous policies, we model the behavior policies as a Gaussian Mixture and overcome the induced optimization difficulties by leveraging the LogSumExp's lower bound and Jensen's Inequality, giving rise to a closed-form policy improvement operator. We instantiate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Machine Learning and ELM
