Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow Policies

Mumuksh Tayal; Manan Tayal; Ravi Prakash

arXiv:2603.15136·cs.LG·March 17, 2026

Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow Policies

Mumuksh Tayal, Manan Tayal, Ravi Prakash

PDF

Open Access

TL;DR

Safe Flow Q-Learning (SafeFQL) introduces a novel offline safe RL method combining reachability-inspired safety value functions with flow policies, enabling real-time safe control with high safety guarantees and low inference latency.

Contribution

SafeFQL extends FQL by integrating a safety value function, flow policy, and conformal calibration for finite-sample safety guarantees, improving safety and efficiency in offline RL.

Findings

01

SafeFQL achieves lower constraint violations than prior methods.

02

SafeFQL offers faster inference suitable for real-time deployment.

03

SafeFQL matches or exceeds previous offline safe RL performance.

Abstract

Offline safe reinforcement learning (RL) seeks reward-maximizing policies from static datasets under strict safety constraints. Existing methods often rely on soft expected-cost objectives or iterative generative inference, which can be insufficient for safety-critical real-time control. We propose Safe Flow Q-Learning (SafeFQL), which extends FQL to safe offline RL by combining a Hamilton--Jacobi reachability-inspired safety value function with an efficient one-step flow policy. SafeFQL learns the safety value via a self-consistency Bellman recursion, trains a flow policy by behavioral cloning, and distills it into a one-step actor for reward-maximizing safe action selection without rejection sampling at deployment. To account for finite-data approximation error in the learned safety boundary, we add a conformal prediction calibration step that adjusts the safety threshold and provides…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Maritime Navigation and Safety