State-Constrained Offline Reinforcement Learning
Charles A. Hepburn, Yue Jin, Giovanni Montana

TL;DR
This paper introduces state-constrained offline RL, enabling policies to leverage out-of-distribution actions that lead to in-distribution states, thereby expanding learning capabilities and improving performance on benchmark datasets.
Contribution
The paper proposes a novel state-constrained offline RL framework and introduces StaCQ, a deep learning algorithm that achieves state-of-the-art results and aligns with new theoretical insights.
Findings
StaCQ achieves state-of-the-art performance on D4RL benchmarks.
Theoretical analysis supports the effectiveness of state-constrained offline RL.
The framework allows combining trajectories more effectively.
Abstract
Traditional offline reinforcement learning (RL) methods predominantly operate in a batch-constrained setting. This confines the algorithms to a specific state-action distribution present in the dataset, reducing the effects of distributional shift but restricting the policy to seen actions. In this paper, we alleviate this limitation by introducing state-constrained offline RL, a novel framework that focuses solely on the dataset's state distribution. This approach allows the policy to take high-quality out-of-distribution actions that lead to in-distribution states, significantly enhancing learning potential. The proposed setting not only broadens the learning horizon but also improves the ability to combine different trajectories from the dataset effectively, a desirable property inherent in offline RL. Our research is underpinned by theoretical findings that pave the way for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Grid Energy Management · Elevator Systems and Control · Reinforcement Learning in Robotics
