Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy

JaeHyeok Doo; Byeongguk Jeon; Seonghyeon Ye; Kimin Lee; Minjoon Seo

arXiv:2605.13435·cs.LG·May 14, 2026

Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy

JaeHyeok Doo, Byeongguk Jeon, Seonghyeon Ye, Kimin Lee, Minjoon Seo

PDF

TL;DR

Q-Flow introduces a novel reinforcement learning framework that uses flow-based policies to achieve stable and expressive decision-making, overcoming previous optimization challenges.

Contribution

It leverages flow dynamics to propagate value information, enabling stable gradient-based optimization without unrolling solvers, balancing stability and expressivity.

Findings

01

Q-Flow outperforms state-of-the-art baselines by 10.6 percentage points on OGBench.

02

It enables stable online adaptation within the same framework.

03

The method effectively bridges the gap between policy stability and expressivity.

Abstract

There is growing interest in utilizing flow-based models as decision-making policies in reinforcement learning due to their high expressive capacity. However, effectively leveraging this expressivity for value maximization remains challenging, as naive gradient-based optimization requires backpropagating through numerical solvers and often leads to instability. Existing approaches typically address this issue by restricting the expressive capacity of flow-based policies, resulting in a trade-off between optimization stability and representational flexibility. To resolve this, we introduce Q-Flow, a framework that leverages the deterministic nature of flow dynamics to explicitly propagate terminal trajectory value to intermediate latent states along the policy-induced flow. This formulation enables stable policy optimization using intermediate value gradients without unrolling the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.