Feasibility-Aware Pessimistic Estimation: Toward Long-Horizon Safety in Offline RL
Zhikun Tao

TL;DR
This paper introduces FASP, a novel offline safe reinforcement learning framework that ensures long-horizon safety and handles out-of-distribution actions using Hamilton-Jacobi reachability, CVAE-based pessimism, and theoretical guarantees.
Contribution
FASP combines reachability analysis, CVAE, and pessimistic Q-value estimation to improve long-term safety and out-of-distribution handling in offline RL, with theoretical validation.
Findings
FASP achieves superior safety performance on DSRL benchmarks.
The framework provides rigorous long-horizon safety guarantees.
FASP demonstrates competitive or improved results compared to state-of-the-art methods.
Abstract
Offline safe reinforcement learning(OSRL) derives constraint-satisfying policies from pre-collected datasets, offers a promising avenue for deploying RL in safety-critical real-world domains such as robotics. However, the majority of existing approaches emphasize only short-term safety, neglecting long-horizon considerations. Consequently, they may violate safety constraints and fail to ensure sustained protection during online deployment. Moreover, the learned policies often struggle to handle states and actions that are not present or out-of-distribution(OOD) from the offline dataset, and exhibit limited sample efficiency. To address these challenges, we propose a novel framework Feasibility-Aware offline Safe Reinforcement Learning with CVAE-based Pessimism (FASP). First, we employ Hamilton-Jacobi (H-J) reachability analysis to generate reliable safety labels, which serve as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
