Anytime-Constrained Reinforcement Learning

Jeremy McMahan; Xiaojin Zhu

arXiv:2311.05511·cs.LG·June 14, 2024·1 cites

Anytime-Constrained Reinforcement Learning

Jeremy McMahan, Xiaojin Zhu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new framework for reinforcement learning with anytime constraints, providing algorithms for planning and learning in such settings, along with complexity results and approximation methods.

Contribution

It develops a reduction from anytime-constrained cMDPs to unconstrained MDPs and proposes efficient algorithms with provable guarantees for approximate solutions.

Findings

01

Optimal deterministic policies exist with augmented cumulative costs.

02

The reduction is fixed-parameter tractable for tabular cMDPs with logarithmic cost precision.

03

Computing non-trivial approximately optimal policies is NP-hard in general.

Abstract

We introduce and study constrained Markov Decision Processes (cMDPs) with anytime constraints. An anytime constraint requires the agent to never violate its budget at any point in time, almost surely. Although Markovian policies are no longer sufficient, we show that there exist optimal deterministic policies augmented with cumulative costs. In fact, we present a fixed-parameter tractable reduction from anytime-constrained cMDPs to unconstrained MDPs. Our reduction yields planning and learning algorithms that are time and sample-efficient for tabular cMDPs so long as the precision of the costs is logarithmic in the size of the cMDP. However, we also show that computing non-trivial approximately optimal policies is NP-hard in general. To circumvent this bottleneck, we design provable approximation algorithms that efficiently compute or learn an arbitrarily accurate approximately feasible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jermcmahan/anytime-constraints
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics