Towards Painless Policy Optimization for Constrained MDPs

Arushi Jain; Sharan Vaswani; Reza Babanezhad; Csaba Szepesvari; Doina; Precup

arXiv:2204.05176·cs.LG·April 12, 2022·1 cites

Towards Painless Policy Optimization for Constrained MDPs

Arushi Jain, Sharan Vaswani, Reza Babanezhad, Csaba Szepesvari, Doina, Precup

PDF

Open Access 1 Repo

TL;DR

This paper introduces a primal-dual framework for policy optimization in constrained Markov decision processes, achieving theoretical guarantees on reward and constraint violations with a new coin-betting algorithm that is robust and hyperparameter-free.

Contribution

The paper proposes a novel primal-dual framework and the Coin Betting Politex (CBP) algorithm for constrained MDPs, with theoretical bounds and practical robustness.

Findings

01

CBP achieves sublinear reward optimality and constraint violation bounds.

02

CBP does not require extensive hyperparameter tuning.

03

Experimental results demonstrate CBP's effectiveness and robustness.

Abstract

We study policy optimization in an infinite horizon, $γ$ -discounted constrained Markov decision process (CMDP). Our objective is to return a policy that achieves large expected reward with a small constraint violation. We consider the online setting with linear function approximation and assume global access to the corresponding features. We propose a generic primal-dual framework that allows us to bound the reward sub-optimality and constraint violation for arbitrary algorithms in terms of their primal and dual regret on online linear optimization problems. We instantiate this framework to use coin-betting algorithms and propose the Coin Betting Politex (CBP) algorithm. Assuming that the action-value functions are $ε_{b}$ -close to the span of the $d$ -dimensional state-action features and no sampling errors, we prove that $T$ iterations of CBP result in an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arushijain94/coinbettingpolitex
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms