Anytime-Constrained Reinforcement Learning
Jeremy McMahan, Xiaojin Zhu

TL;DR
This paper introduces a new framework for reinforcement learning with anytime constraints, providing algorithms for planning and learning in such settings, along with complexity results and approximation methods.
Contribution
It develops a reduction from anytime-constrained cMDPs to unconstrained MDPs and proposes efficient algorithms with provable guarantees for approximate solutions.
Findings
Optimal deterministic policies exist with augmented cumulative costs.
The reduction is fixed-parameter tractable for tabular cMDPs with logarithmic cost precision.
Computing non-trivial approximately optimal policies is NP-hard in general.
Abstract
We introduce and study constrained Markov Decision Processes (cMDPs) with anytime constraints. An anytime constraint requires the agent to never violate its budget at any point in time, almost surely. Although Markovian policies are no longer sufficient, we show that there exist optimal deterministic policies augmented with cumulative costs. In fact, we present a fixed-parameter tractable reduction from anytime-constrained cMDPs to unconstrained MDPs. Our reduction yields planning and learning algorithms that are time and sample-efficient for tabular cMDPs so long as the precision of the costs is logarithmic in the size of the cMDP. However, we also show that computing non-trivial approximately optimal policies is NP-hard in general. To circumvent this bottleneck, we design provable approximation algorithms that efficiently compute or learn an arbitrarily accurate approximately feasible…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
