A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning
Kihyuk Hong, Yuhang Li, Ambuj Tewari

TL;DR
This paper introduces PDCA, a primal-dual-critic algorithm for offline constrained reinforcement learning that efficiently finds near-optimal policies under realistic assumptions, improving upon previous methods.
Contribution
The paper presents PDCA, a novel offline constrained RL algorithm using a primal-dual approach with general function approximation, requiring weaker assumptions than prior work.
Findings
PDCA can find near saddle points of the Lagrangian, leading to near-optimal policies.
The algorithm achieves sample efficiency under realistic assumptions.
It does not require strong Bellman completeness, unlike previous methods.
Abstract
Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function estimated by critics. The primal player employs a no-regret policy optimization oracle to maximize the Lagrangian estimate and the dual player acts greedily to minimize the Lagrangian estimate. We show that PDCA can successfully find a near saddle point of the Lagrangian, which is nearly optimal for the constrained RL problem. Unlike previous work that requires concentrability and a strong Bellman completeness assumption, PDCA only requires concentrability and realizability assumptions for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning
