Polynomial-Time Approximability of Constrained Reinforcement Learning

Jeremy McMahan

arXiv:2502.07764·cs.DS·February 12, 2025

Polynomial-Time Approximability of Constrained Reinforcement Learning

Jeremy McMahan

PDF

Open Access 1 Models 1 Video

TL;DR

This paper introduces a polynomial-time approximation algorithm for constrained Markov decision processes, addressing key open questions in the computational complexity of constrained reinforcement learning.

Contribution

It presents the first polynomial-time approximation algorithm for various constrained RL settings, including chance and expectation constraints, with optimal guarantees under P ≠ NP.

Findings

01

Provides a $(0,\, ext{epsilon})$-additive bicriteria approximation algorithm.

02

Establishes matching lower bounds, proving optimality under P ≠ NP.

03

Answers several long-standing open complexity questions in constrained RL.

Abstract

We study the computational complexity of approximating general constrained Markov decision processes. Our primary contribution is the design of a polynomial time $(0, ϵ)$ -additive bicriteria approximation algorithm for finding optimal constrained policies across a broad class of recursively computable constraints, including almost-sure, chance, expectation, and their anytime variants. Matching lower bounds imply our approximation guarantees are optimal so long as $P \neq = N P$ . The generality of our approach results in answers to several long-standing open complexity questions in the constrained reinforcement learning literature. Specifically, we are the first to prove polynomial-time approximability for the following settings: policies under chance constraints, deterministic policies under multiple expectation constraints, policies under non-homogeneous constraints (i.e.,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
atbender/Qwen3-REAP-15B-A3B-W4A16-custom-calib
model· 24 dl
24 dl

Videos

Polynomial-Time Approximability of Constrained Reinforcement Learning· slideslive

Taxonomy

TopicsElevator Systems and Control