Accelerated Primal-Dual Policy Optimization for Safe Reinforcement   Learning

Qingkai Liang; Fanyu Que; Eytan Modiano

arXiv:1802.06480·cs.AI·February 20, 2018·65 cites

Accelerated Primal-Dual Policy Optimization for Safe Reinforcement Learning

Qingkai Liang, Fanyu Que, Eytan Modiano

PDF

Open Access

TL;DR

This paper introduces APDO, an accelerated primal-dual policy optimization method for safe reinforcement learning in CMDPs, improving sample efficiency and convergence speed by integrating off-policy dual updates.

Contribution

The paper presents a novel off-policy dual variable update technique within primal-dual policy optimization for CMDPs, enhancing efficiency and convergence.

Findings

01

APDO outperforms existing methods in sample efficiency.

02

APDO achieves faster convergence in simulated robot tasks.

03

Experimental results validate the effectiveness of off-policy dual updates.

Abstract

Constrained Markov Decision Process (CMDP) is a natural framework for reinforcement learning tasks with safety constraints, where agents learn a policy that maximizes the long-term reward while satisfying the constraints on the long-term cost. A canonical approach for solving CMDPs is the primal-dual method which updates parameters in primal and dual spaces in turn. Existing methods for CMDPs only use on-policy data for dual updates, which results in sample inefficiency and slow convergence. In this paper, we propose a policy search method for CMDPs called Accelerated Primal-Dual Optimization (APDO), which incorporates an off-policy trained dual variable in the dual update procedure while updating the policy in primal space with on-policy likelihood ratio gradient. Experimental results on a simulated robot locomotion task show that APDO achieves better sample efficiency and faster…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Electric Vehicles and Infrastructure · Autonomous Vehicle Technology and Safety