A Behavior Regularized Implicit Policy for Offline Reinforcement Learning
Shentao Yang, Zhendong Wang, Huangjie Zheng, Yihao Feng, Mingyuan Zhou

TL;DR
This paper introduces a novel framework for offline reinforcement learning that employs a regularized implicit policy, improving robustness and generalization by leveraging a modified policy-matching approach with theoretical guarantees.
Contribution
It proposes a new regularized implicit policy framework with a modified policy-matching method based on Jensen--Shannon divergence, supported by theoretical analysis and practical GAN-based implementation.
Findings
Outperforms existing methods on D4RL benchmarks.
The proposed approach demonstrates strong finite-sample properties.
Ablation studies confirm the effectiveness of each component.
Abstract
Offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. The lack of environmental interactions makes the policy training vulnerable to state-action pairs far from the training dataset and prone to missing rewarding actions. For training more effective agents, we propose a framework that supports learning a flexible yet well-regularized fully-implicit policy. We further propose a simple modification to the classical policy-matching methods for regularizing with respect to the dual form of the Jensen--Shannon divergence and the integral probability metrics. We theoretically show the correctness of the policy-matching approach, and the correctness and a good finite-sample property of our modification. An effective instantiation of our framework through the GAN structure is provided, together with techniques to explicitly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
