Safe-Support Q-Learning: Learning without Unsafe Exploration
Yeeun Lim, Narim Jeong, Donghwan Lee

TL;DR
This paper introduces a safe reinforcement learning framework that ensures no unsafe states are visited during training by using a behavior policy supported on a safe set and a KL-regularized Bellman target.
Contribution
It proposes a novel Q-learning-based safe RL method that guarantees safety during training without sacrificing exploration within the safe set.
Findings
Achieves stable learning and well-calibrated value estimates.
Yields safer behavior with comparable or better performance than baselines.
Supports different action spaces and behavior policies.
Abstract
Ensuring safety during reinforcement learning (RL) training is critical in real-world applications where unsafe exploration can lead to devastating outcomes. While most safe RL methods mitigate risk through constraints or penalization, they still allow exploration of unsafe states during training. In this work, we adopt a stricter safety requirement that eliminates unsafe state visitation during training. To achieve this goal, we propose a Q-learning-based safe RL framework that leverages a behavior policy supported on a safe set. Under the assumption that the induced trajectories remain within the safe set, this policy enables sufficient exploration within the safe region without requiring near-optimality. We adopt a two-stage framework in which the Q-function and policy are trained separately. Specifically, we introduce a KL-regularized Bellman target that constrains the Q-function to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
