Provably Efficient Safe Exploration via Primal-Dual Policy Optimization

Dongsheng Ding; Xiaohan Wei; Zhuoran Yang; Zhaoran Wang; Mihailo R.; Jovanovi\'c

arXiv:2003.00534·cs.LG·October 27, 2020·35 cites

Provably Efficient Safe Exploration via Primal-Dual Policy Optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R., Jovanovi\'c

PDF

Open Access

TL;DR

This paper introduces an efficient algorithm for safe reinforcement learning in complex environments, balancing reward maximization and safety constraints with provable guarantees in a function approximation setting.

Contribution

It proposes the first provably efficient online policy optimization algorithm for CMDPs with safety constraints under function approximation.

Findings

01

Achieves $ ilde{O}(d H^{2.5}\sqrt{T})$ regret and constraint violation bounds.

02

Handles infinite state spaces via feature mapping.

03

Provides theoretical guarantees for safe exploration in CMDPs.

Abstract

We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation in which an agent aims to maximize the expected total reward subject to a safety constraint on the expected total value of a utility function. We focus on an episodic setting with the function approximation where the Markov transition kernels have a linear structure but do not impose any additional assumptions on the sampling model. Designing SRL algorithms with provable computational and statistical efficiency is particularly challenging under this setting because of the need to incorporate both the safety constraint and the function approximation into the fundamental exploitation/exploration tradeoff. To this end, we present an \underline{O}ptimistic \underline{P}rimal-\underline{D}ual Proximal Policy \underline{OP}timization (OPDOP) algorithm where the value…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning