Safe Reinforcement Learning with Instantaneous Constraints: The Role of Aggressive Exploration
Honghao Wei, Xin Liu, Lei Ying

TL;DR
This paper introduces LSVI-AE, a safe reinforcement learning algorithm that handles instantaneous constraints without prior safe action knowledge, using aggressive exploration and RKHS-based cost function approximation, achieving near-optimal regret and constraint violation bounds.
Contribution
It generalizes safe RL to RKHS cost functions without assuming known safe actions, and promotes aggressive exploration strategies.
Findings
Achieves near-optimal regret bounds of ( ext{d}^3H^4K)
Bounds constraint violations as (H\u03b3_K \u221a K) for RKHS costs
Demonstrates the effectiveness of aggressive exploration in safe RL without prior safe action knowledge.
Abstract
This paper studies safe Reinforcement Learning (safe RL) with linear function approximation and under hard instantaneous constraints where unsafe actions must be avoided at each step. Existing studies have considered safe RL with hard instantaneous constraints, but their approaches rely on several key assumptions: the RL agent knows a safe action set for {\it every} state or knows a {\it safe graph} in which all the state-action-state triples are safe, and the constraint/cost functions are {\it linear}. In this paper, we consider safe RL with instantaneous hard constraints without assumption and generalize to Reproducing Kernel Hilbert Space (RKHS). Our proposed algorithm, LSVI-AE, achieves regret and hard constraint violation when the cost function is linear and hard constraint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
MethodsSparse Evolutionary Training
