Deterministic Policies for Constrained Reinforcement Learning in   Polynomial Time

Jeremy McMahan

arXiv:2405.14183·cs.LG·November 1, 2024

Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time

Jeremy McMahan

PDF

Open Access 1 Video

TL;DR

This paper introduces a polynomial-time algorithm for computing near-optimal deterministic policies in constrained reinforcement learning, addressing longstanding open questions across various constraint types.

Contribution

It presents a fully polynomial-time approximation scheme for TSR cost criteria, combining value-demand augmentation, approximate dynamic programming, and time-space rounding.

Findings

01

Provides the first polynomial-time approximation scheme for constrained RL policies

02

Addresses open questions on polynomial-time approximability for various constraints

03

Enables efficient computation of deterministic policies under complex constraints

Abstract

We present a novel algorithm that efficiently computes near-optimal deterministic policies for constrained reinforcement learning (CRL) problems. Our approach combines three key ideas: (1) value-demand augmentation, (2) action-space approximate dynamic programming, and (3) time-space rounding. Our algorithm constitutes a fully polynomial-time approximation scheme (FPTAS) for any time-space recursive (TSR) cost criteria. A TSR criteria requires the cost of a policy to be computable recursively over both time and (state) space, which includes classical expectation, almost sure, and anytime constraints. Our work answers three open questions spanning two long-standing lines of research: polynomial-time approximability is possible for 1) anytime-constrained policies, 2) almost-sure-constrained policies, and 3) deterministic expectation-constrained policies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics