A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement   Learning

Kihyuk Hong; Yuhang Li; Ambuj Tewari

arXiv:2306.07818·cs.LG·October 23, 2023·1 cites

A Primal-Dual-Critic Algorithm for Offline Constrained Reinforcement Learning

Kihyuk Hong, Yuhang Li, Ambuj Tewari

PDF

Open Access

TL;DR

This paper introduces PDCA, a primal-dual-critic algorithm for offline constrained reinforcement learning that efficiently finds near-optimal policies under realistic assumptions, improving upon previous methods.

Contribution

The paper presents PDCA, a novel offline constrained RL algorithm using a primal-dual approach with general function approximation, requiring weaker assumptions than prior work.

Findings

01

PDCA can find near saddle points of the Lagrangian, leading to near-optimal policies.

02

The algorithm achieves sample efficiency under realistic assumptions.

03

It does not require strong Bellman completeness, unlike previous methods.

Abstract

Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the expected cumulative reward subject to constraints on expected cumulative cost using an existing dataset. In this paper, we propose Primal-Dual-Critic Algorithm (PDCA), a novel algorithm for offline constrained RL with general function approximation. PDCA runs a primal-dual algorithm on the Lagrangian function estimated by critics. The primal player employs a no-regret policy optimization oracle to maximize the Lagrangian estimate and the dual player acts greedily to minimize the Lagrangian estimate. We show that PDCA can successfully find a near saddle point of the Lagrangian, which is nearly optimal for the constrained RL problem. Unlike previous work that requires concentrability and a strong Bellman completeness assumption, PDCA only requires concentrability and realizability assumptions for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning