Hypercube Policy Regularization Framework for Offline Reinforcement Learning

Yi Shen; Hanyan Huang

arXiv:2411.04534·cs.LG·February 12, 2026

Hypercube Policy Regularization Framework for Offline Reinforcement Learning

Yi Shen, Hanyan Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a hypercube policy regularization framework for offline reinforcement learning, allowing more flexible policy exploration in static datasets and improving performance over existing methods.

Contribution

The paper proposes a novel hypercube policy regularization framework that alleviates over-conservativeness in policy constraints, enhancing algorithm effectiveness in low-quality datasets.

Findings

01

Outperforms state-of-the-art algorithms on D4RL datasets

02

Theoretically improves original algorithm performance

03

Enhances policy exploration in static datasets

Abstract

Offline reinforcement learning has received extensive attention from scholars because it avoids the interaction between the agent and the environment by learning a policy through a static dataset. However, general reinforcement learning methods cannot get satisfactory results in offline reinforcement learning due to the out-of-distribution state actions that the dataset cannot cover during training. To solve this problem, the policy regularization method that tries to directly clone policies used in static datasets has received numerous studies due to its simplicity and effectiveness. However, policy constraint methods make the agent choose the corresponding actions in the static dataset. This type of constraint is usually over-conservative, which results in suboptimal policies, especially in low-quality static datasets. In this paper, a hypercube policy regularization framework is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lastTarnished/Hypyercud-Policy-Regularization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsSoftmax · Attention Is All You Need · Implicit Q-Learning