Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL
Ruiquan Huang, Jing Yang, Yingbin Liang

TL;DR
This paper introduces a safe exploration framework for reward-free reinforcement learning that maintains safety constraints without increasing sample complexity, demonstrated through algorithms for tabular and low-rank MDPs.
Contribution
It proposes the SWEET framework and algorithms that ensure safe exploration with minimal sample complexity increase in reward-free RL.
Findings
Algorithms achieve zero constraint violation during exploration.
Sample complexities match or outperform unconstrained methods.
Safety constraints do not significantly increase sample complexity.
Abstract
Reward-free reinforcement learning (RF-RL), a recently introduced RL paradigm, relies on random action-taking to explore the unknown environment without any reward feedback information. While the primary goal of the exploration phase in RF-RL is to reduce the uncertainty in the estimated model with minimum number of trajectories, in practice, the agent often needs to abide by certain safety constraint at the same time. It remains unclear how such safe exploration requirement would affect the corresponding sample complexity in order to achieve the desired optimality of the obtained policy in planning. In this work, we make a first attempt to answer this question. In particular, we consider the scenario where a safe baseline policy is known beforehand, and propose a unified Safe reWard-frEe ExploraTion (SWEET) framework. We then particularize the SWEET framework to the tabular and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
