Evolving Diffusion and Flow Matching Policies for Online Reinforcement Learning

Chubin Zhang; Zhenglin Wan; Feng Chen; Fuchao Yang; Lang Feng; Yaxin Zhou; Xingrui Yu; Yang You; Ivor Tsang; Bo An

arXiv:2512.02581·cs.LG·March 10, 2026

Evolving Diffusion and Flow Matching Policies for Online Reinforcement Learning

Chubin Zhang, Zhenglin Wan, Feng Chen, Fuchao Yang, Lang Feng, Yaxin Zhou, Xingrui Yu, Yang You, Ivor Tsang, Bo An

PDF

Open Access

TL;DR

This paper introduces GoRL, a new framework for online reinforcement learning that decouples policy optimization from generation, enabling stable training of highly expressive, multimodal policies that outperform existing methods on complex control tasks.

Contribution

The paper proposes GoRL, a novel approach that separates optimization and generation in policy training, improving stability and expressiveness in reinforcement learning.

Findings

01

GoRL outperforms baseline methods on diverse control tasks.

02

Achieves over 870 episodic returns on HopperStand, tripling the best baseline.

03

Enables stable training of expressive, multimodal policies.

Abstract

Diffusion and flow matching policies offer expressive, multimodal action modeling, yet they are frequently unstable in online reinforcement learning (RL) due to intractable likelihoods and gradients propagating through long sampling chains. Conversely, tractable parameterizations such as Gaussians lack the expressiveness needed for complex control -- exposing a persistent tension between optimization stability and representational power. We address this tension with a key structural principle: decoupling optimization from generation. Building on this, we introduce GoRL (Generative Online Reinforcement Learning), an algorithm-agnostic framework that trains expressive policies from scratch by confining policy optimization to a tractable latent space while delegating action synthesis to a conditional generative decoder. Using a two-timescale alternating schedule and anchoring decoder…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis · Music Technology and Sound Studies