Safe Reinforcement Learning with Instantaneous Constraints: The Role of   Aggressive Exploration

Honghao Wei; Xin Liu; Lei Ying

arXiv:2312.14470·cs.LG·December 25, 2023·2 cites

Safe Reinforcement Learning with Instantaneous Constraints: The Role of Aggressive Exploration

Honghao Wei, Xin Liu, Lei Ying

PDF

Open Access

TL;DR

This paper introduces LSVI-AE, a safe reinforcement learning algorithm that handles instantaneous constraints without prior safe action knowledge, using aggressive exploration and RKHS-based cost function approximation, achieving near-optimal regret and constraint violation bounds.

Contribution

It generalizes safe RL to RKHS cost functions without assuming known safe actions, and promotes aggressive exploration strategies.

Findings

01

Achieves near-optimal regret bounds of ( ext{d}^3H^4K)

02

Bounds constraint violations as (H\u03b3_K \u221a K) for RKHS costs

03

Demonstrates the effectiveness of aggressive exploration in safe RL without prior safe action knowledge.

Abstract

This paper studies safe Reinforcement Learning (safe RL) with linear function approximation and under hard instantaneous constraints where unsafe actions must be avoided at each step. Existing studies have considered safe RL with hard instantaneous constraints, but their approaches rely on several key assumptions: $(i)$ the RL agent knows a safe action set for {\it every} state or knows a {\it safe graph} in which all the state-action-state triples are safe, and $(ii)$ the constraint/cost functions are {\it linear}. In this paper, we consider safe RL with instantaneous hard constraints without assumption $(i)$ and generalize $(ii)$ to Reproducing Kernel Hilbert Space (RKHS). Our proposed algorithm, LSVI-AE, achieves $\tilde{\cO} (d^{3} H^{4} K)$ regret and $\tilde{\cO} (H d K)$ hard constraint violation when the cost function is linear and $\cO (H γ_{K} K)$ hard constraint…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)

MethodsSparse Evolutionary Training