ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
Akhil Agnihotri, Rahul Jain, Haipeng Luo

TL;DR
This paper introduces ACPO, a novel policy optimization algorithm tailored for average constrained Markov Decision Processes, addressing the limitations of existing methods in average-CMDPs with theoretical guarantees and superior empirical results.
Contribution
The paper proposes ACPO, a new trust region-based policy optimization algorithm specifically designed for average-CMDPs, with theoretical analysis and extensive experimental validation.
Findings
ACPO outperforms existing algorithms in OpenAI Gym environments.
Theoretical guarantees are established for ACPO's performance.
ACPO effectively handles constraints in average-CMDPs.
Abstract
Reinforcement Learning (RL) for constrained MDPs (CMDPs) is an increasingly important problem for various applications. Often, the average criterion is more suitable than the discounted criterion. Yet, RL for average-CMDPs (ACMDPs) remains a challenging problem. Algorithms designed for discounted constrained RL problems often do not perform well for the average CMDP setting. In this paper, we introduce a new policy optimization with function approximation algorithm for constrained MDPs with the average criterion. The Average-Constrained Policy Optimization (ACPO) algorithm is inspired by trust region-based policy optimization algorithms. We develop basic sensitivity theory for average CMDPs, and then use the corresponding bounds in the design of the algorithm. We provide theoretical guarantees on its performance, and through extensive experimental work in various challenging OpenAI Gym…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Formal Methods in Verification
