ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints

Akhil Agnihotri; Rahul Jain; Haipeng Luo

arXiv:2302.00808·cs.LG·May 27, 2024

ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints

Akhil Agnihotri, Rahul Jain, Haipeng Luo

PDF

Open Access

TL;DR

This paper introduces ACPO, a novel policy optimization algorithm tailored for average constrained Markov Decision Processes, addressing the limitations of existing methods in average-CMDPs with theoretical guarantees and superior empirical results.

Contribution

The paper proposes ACPO, a new trust region-based policy optimization algorithm specifically designed for average-CMDPs, with theoretical analysis and extensive experimental validation.

Findings

01

ACPO outperforms existing algorithms in OpenAI Gym environments.

02

Theoretical guarantees are established for ACPO's performance.

03

ACPO effectively handles constraints in average-CMDPs.

Abstract

Reinforcement Learning (RL) for constrained MDPs (CMDPs) is an increasingly important problem for various applications. Often, the average criterion is more suitable than the discounted criterion. Yet, RL for average-CMDPs (ACMDPs) remains a challenging problem. Algorithms designed for discounted constrained RL problems often do not perform well for the average CMDP setting. In this paper, we introduce a new policy optimization with function approximation algorithm for constrained MDPs with the average criterion. The Average-Constrained Policy Optimization (ACPO) algorithm is inspired by trust region-based policy optimization algorithms. We develop basic sensitivity theory for average CMDPs, and then use the corresponding bounds in the design of the algorithm. We provide theoretical guarantees on its performance, and through extensive experimental work in various challenging OpenAI Gym…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Formal Methods in Verification