Offline Constrained Reinforcement Learning under Partial Data Coverage

Seokmin Ko; Ambuj Tewari; Kihyuk Hong

arXiv:2505.17506·stat.ML·May 13, 2026

Offline Constrained Reinforcement Learning under Partial Data Coverage

Seokmin Ko, Ambuj Tewari, Kihyuk Hong

PDF

Abstract

We study offline constrained reinforcement learning with general function approximation in discounted constrained Markov decision processes. Prior methods either require full data coverage for evaluating intermediate policies, lack oracle efficiency, or requires the knowledge of data-generating distribution for policy extraction. We propose PDOCRL, an oracle-efficient primal-dual algorithm based on a decomposed linear-programming formulation that makes the policy an explicit optimization variable. This avoids policy extraction that requires the knowledge of data-generating distribution, and only uses standard policy-optimization, online linear-optimization, and linear-minimization oracles. We show that saddle-point formulations using general function approximation can have spurious saddle points even when an optimal solution is realizable, and identify a stronger realizability condition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.