Towards Painless Policy Optimization for Constrained MDPs
Arushi Jain, Sharan Vaswani, Reza Babanezhad, Csaba Szepesvari, Doina, Precup

TL;DR
This paper introduces a primal-dual framework for policy optimization in constrained Markov decision processes, achieving theoretical guarantees on reward and constraint violations with a new coin-betting algorithm that is robust and hyperparameter-free.
Contribution
The paper proposes a novel primal-dual framework and the Coin Betting Politex (CBP) algorithm for constrained MDPs, with theoretical bounds and practical robustness.
Findings
CBP achieves sublinear reward optimality and constraint violation bounds.
CBP does not require extensive hyperparameter tuning.
Experimental results demonstrate CBP's effectiveness and robustness.
Abstract
We study policy optimization in an infinite horizon, -discounted constrained Markov decision process (CMDP). Our objective is to return a policy that achieves large expected reward with a small constraint violation. We consider the online setting with linear function approximation and assume global access to the corresponding features. We propose a generic primal-dual framework that allows us to bound the reward sub-optimality and constraint violation for arbitrary algorithms in terms of their primal and dual regret on online linear optimization problems. We instantiate this framework to use coin-betting algorithms and propose the Coin Betting Politex (CBP) algorithm. Assuming that the action-value functions are -close to the span of the -dimensional state-action features and no sampling errors, we prove that iterations of CBP result in an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms
