Flow-Based Policy for Online Reinforcement Learning

Lei Lv; Yunfei Li; Yu Luo; Fuchun Sun; Tao Kong; Jiafeng Xu; Xiao Ma

arXiv:2506.12811·cs.LG·June 17, 2025

Flow-Based Policy for Online Reinforcement Learning

Lei Lv, Yunfei Li, Yu Luo, Fuchun Sun, Tao Kong, Jiafeng Xu, Xiao Ma

PDF

Open Access

TL;DR

FlowRL introduces a flow-based policy framework for online reinforcement learning that enhances expressiveness and aligns policy optimization with RL objectives, leading to improved performance on benchmark tasks.

Contribution

The paper proposes a novel flow-based policy representation integrated with Wasserstein-2 regularization, addressing optimization challenges in online RL and improving policy expressiveness.

Findings

01

FlowRL achieves competitive results on DMControl benchmarks.

02

The approach effectively aligns flow-based policies with RL objectives.

03

Empirical results demonstrate improved policy performance in complex environments.

Abstract

We present \textbf{FlowRL}, a novel framework for online reinforcement learning that integrates flow-based policy representation with Wasserstein-2-regularized optimization. We argue that in addition to training signals, enhancing the expressiveness of the policy class is crucial for the performance gains in RL. Flow-based generative models offer such potential, excelling at capturing complex, multimodal action distributions. However, their direct application in online RL is challenging due to a fundamental objective mismatch: standard flow training optimizes for static data imitation, while RL requires value-based policy optimization through a dynamic buffer, leading to difficult optimization landscapes. FlowRL first models policies via a state-dependent velocity field, generating actions through deterministic ODE integration from noise. We derive a constrained policy search objective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Energy Management