Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow Policies
Mumuksh Tayal, Manan Tayal, Ravi Prakash

TL;DR
Safe Flow Q-Learning (SafeFQL) introduces a novel offline safe RL method combining reachability-inspired safety value functions with flow policies, enabling real-time safe control with high safety guarantees and low inference latency.
Contribution
SafeFQL extends FQL by integrating a safety value function, flow policy, and conformal calibration for finite-sample safety guarantees, improving safety and efficiency in offline RL.
Findings
SafeFQL achieves lower constraint violations than prior methods.
SafeFQL offers faster inference suitable for real-time deployment.
SafeFQL matches or exceeds previous offline safe RL performance.
Abstract
Offline safe reinforcement learning (RL) seeks reward-maximizing policies from static datasets under strict safety constraints. Existing methods often rely on soft expected-cost objectives or iterative generative inference, which can be insufficient for safety-critical real-time control. We propose Safe Flow Q-Learning (SafeFQL), which extends FQL to safe offline RL by combining a Hamilton--Jacobi reachability-inspired safety value function with an efficient one-step flow policy. SafeFQL learns the safety value via a self-consistency Bellman recursion, trains a flow policy by behavioral cloning, and distills it into a one-step actor for reward-maximizing safe action selection without rejection sampling at deployment. To account for finite-data approximation error in the learned safety boundary, we add a conformal prediction calibration step that adjusts the safety threshold and provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Maritime Navigation and Safety
