Deterministic Policies for Constrained Reinforcement Learning in Polynomial Time
Jeremy McMahan

TL;DR
This paper introduces a polynomial-time algorithm for computing near-optimal deterministic policies in constrained reinforcement learning, addressing longstanding open questions across various constraint types.
Contribution
It presents a fully polynomial-time approximation scheme for TSR cost criteria, combining value-demand augmentation, approximate dynamic programming, and time-space rounding.
Findings
Provides the first polynomial-time approximation scheme for constrained RL policies
Addresses open questions on polynomial-time approximability for various constraints
Enables efficient computation of deterministic policies under complex constraints
Abstract
We present a novel algorithm that efficiently computes near-optimal deterministic policies for constrained reinforcement learning (CRL) problems. Our approach combines three key ideas: (1) value-demand augmentation, (2) action-space approximate dynamic programming, and (3) time-space rounding. Our algorithm constitutes a fully polynomial-time approximation scheme (FPTAS) for any time-space recursive (TSR) cost criteria. A TSR criteria requires the cost of a policy to be computable recursively over both time and (state) space, which includes classical expectation, almost sure, and anytime constraints. Our work answers three open questions spanning two long-standing lines of research: polynomial-time approximability is possible for 1) anytime-constrained policies, 2) almost-sure-constrained policies, and 3) deterministic expectation-constrained policies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
