Confident Natural Policy Gradient for Local Planning in   $q_\pi$-realizable Constrained MDPs

Tian Tian; Lin F. Yang; Csaba Szepesv\'ari

arXiv:2406.18529·cs.LG·December 11, 2024

Confident Natural Policy Gradient for Local Planning in $q_\pi$-realizable Constrained MDPs

Tian Tian, Lin F. Yang, Csaba Szepesv\'ari

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel algorithm for efficiently learning constrained policies in large state spaces using linear function approximation, ensuring safety constraints are met while optimizing rewards.

Contribution

It presents the first polynomial sample complexity algorithm for $q_ ext{pi}$-realizable CMDPs using a primal-dual approach with off-policy evaluation.

Findings

01

Achieves polynomial sample complexity in the $q_ ext{pi}$-realizable setting.

02

Ensures policies satisfy constraints with high probability.

03

Uses off-policy evaluation to efficiently update policies.

Abstract

The constrained Markov decision process (CMDP) framework emerges as an important reinforcement learning approach for imposing safety or other critical objectives while maximizing cumulative reward. However, the current understanding of how to learn efficiently in a CMDP environment with a potentially infinite number of states remains under investigation, particularly when function approximation is applied to the value functions. In this paper, we address the learning problem given linear function approximation with $q_{π}$ -realizability, where the value functions of all policies are linearly representable with a known feature map, a setting known to be more general and challenging than other linear settings. Utilizing a local-access model, we propose a novel primal-dual algorithm that, after $\tilde{O} (poly (d) ϵ^{- 3})$ queries, outputs with high probability a policy that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Confident Natural Policy Gradient for Local Planning in $q_\pi$-realizable Constrained MDPs· slideslive

Taxonomy

TopicsLogic, Reasoning, and Knowledge · Formal Methods in Verification · Optimization and Search Problems