Safe Exploration Incurs Nearly No Additional Sample Complexity for   Reward-free RL

Ruiquan Huang; Jing Yang; Yingbin Liang

arXiv:2206.14057·cs.LG·March 23, 2023

Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL

Ruiquan Huang, Jing Yang, Yingbin Liang

PDF

Open Access 1 Video

TL;DR

This paper introduces a safe exploration framework for reward-free reinforcement learning that maintains safety constraints without increasing sample complexity, demonstrated through algorithms for tabular and low-rank MDPs.

Contribution

It proposes the SWEET framework and algorithms that ensure safe exploration with minimal sample complexity increase in reward-free RL.

Findings

01

Algorithms achieve zero constraint violation during exploration.

02

Sample complexities match or outperform unconstrained methods.

03

Safety constraints do not significantly increase sample complexity.

Abstract

Reward-free reinforcement learning (RF-RL), a recently introduced RL paradigm, relies on random action-taking to explore the unknown environment without any reward feedback information. While the primary goal of the exploration phase in RF-RL is to reduce the uncertainty in the estimated model with minimum number of trajectories, in practice, the agent often needs to abide by certain safety constraint at the same time. It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning. In this work, we make a first attempt to answer this question. In particular, we consider the scenario where a safe baseline policy is known beforehand, and propose a unified Safe reWard-frEe ExploraTion (SWEET) framework. We then particularize the SWEET framework to the tabular and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-Free RL· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning