Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy
JaeHyeok Doo, Byeongguk Jeon, Seonghyeon Ye, Kimin Lee, Minjoon Seo

TL;DR
Q-Flow introduces a novel reinforcement learning framework that uses flow-based policies to achieve stable and expressive decision-making, overcoming previous optimization challenges.
Contribution
It leverages flow dynamics to propagate value information, enabling stable gradient-based optimization without unrolling solvers, balancing stability and expressivity.
Findings
Q-Flow outperforms state-of-the-art baselines by 10.6 percentage points on OGBench.
It enables stable online adaptation within the same framework.
The method effectively bridges the gap between policy stability and expressivity.
Abstract
There is growing interest in utilizing flow-based models as decision-making policies in reinforcement learning due to their high expressive capacity. However, effectively leveraging this expressivity for value maximization remains challenging, as naive gradient-based optimization requires backpropagating through numerical solvers and often leads to instability. Existing approaches typically address this issue by restricting the expressive capacity of flow-based policies, resulting in a trade-off between optimization stability and representational flexibility. To resolve this, we introduce Q-Flow, a framework that leverages the deterministic nature of flow dynamics to explicitly propagate terminal trajectory value to intermediate latent states along the policy-induced flow. This formulation enables stable policy optimization using intermediate value gradients without unrolling the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
