Hypercube Policy Regularization Framework for Offline Reinforcement Learning
Yi Shen, Hanyan Huang

TL;DR
This paper introduces a hypercube policy regularization framework for offline reinforcement learning, allowing more flexible policy exploration in static datasets and improving performance over existing methods.
Contribution
The paper proposes a novel hypercube policy regularization framework that alleviates over-conservativeness in policy constraints, enhancing algorithm effectiveness in low-quality datasets.
Findings
Outperforms state-of-the-art algorithms on D4RL datasets
Theoretically improves original algorithm performance
Enhances policy exploration in static datasets
Abstract
Offline reinforcement learning has received extensive attention from scholars because it avoids the interaction between the agent and the environment by learning a policy through a static dataset. However, general reinforcement learning methods cannot get satisfactory results in offline reinforcement learning due to the out-of-distribution state actions that the dataset cannot cover during training. To solve this problem, the policy regularization method that tries to directly clone policies used in static datasets has received numerous studies due to its simplicity and effectiveness. However, policy constraint methods make the agent choose the corresponding actions in the static dataset. This type of constraint is usually over-conservative, which results in suboptimal policies, especially in low-quality static datasets. In this paper, a hypercube policy regularization framework is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsSoftmax · Attention Is All You Need · Implicit Q-Learning
