Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning
Qingkai Liang, Fanyu Que, Eytan Modiano

TL;DR
This paper introduces APDO, an accelerated primal-dual policy optimization method for safe reinforcement learning in CMDPs, improving sample efficiency and convergence speed by integrating off-policy dual updates.
Contribution
The paper presents a novel off-policy dual variable update technique within primal-dual policy optimization for CMDPs, enhancing efficiency and convergence.
Findings
APDO outperforms existing methods in sample efficiency.
APDO achieves faster convergence in simulated robot tasks.
Experimental results validate the effectiveness of off-policy dual updates.
Abstract
Constrained Markov Decision Process (CMDP) is a natural framework for reinforcement learning tasks with safety constraints, where agents learn a policy that maximizes the long-term reward while satisfying the constraints on the long-term cost. A canonical approach for solving CMDPs is the primal-dual method which updates parameters in primal and dual spaces in turn. Existing methods for CMDPs only use on-policy data for dual updates, which results in sample inefficiency and slow convergence. In this paper, we propose a policy search method for CMDPs called Accelerated Primal-Dual Optimization (APDO), which incorporates an off-policy trained dual variable in the dual update procedure while updating the policy in primal space with on-policy likelihood ratio gradient. Experimental results on a simulated robot locomotion task show that APDO achieves better sample efficiency and faster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Electric Vehicles and Infrastructure · Autonomous Vehicle Technology and Safety
