Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies

Zeyang Li; Sunbochen Tang; Navid Azizan

arXiv:2601.08136·cs.LG·January 14, 2026

Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies

Zeyang Li, Sunbochen Tang, Navid Azizan

PDF

Open Access

TL;DR

This paper introduces reverse flow matching (RFM), a unified framework that improves training of diffusion and flow policies in online reinforcement learning by reducing variance and combining Q-value and gradient information.

Contribution

The paper presents RFM, a general method unifying diffusion and flow policy training, extending Boltzmann distribution targeting, and combining Q-value and gradient data for enhanced efficiency.

Findings

01

RFM improves training stability and efficiency.

02

Enhanced performance on continuous-control benchmarks.

03

Unified framework encompasses existing methods as special cases.

Abstract

Diffusion and flow policies are gaining prominence in online reinforcement learning (RL) due to their expressive power, yet training them efficiently remains a critical challenge. A fundamental difficulty in online RL is the lack of direct samples from the target distribution; instead, the target is an unnormalized Boltzmann distribution defined by the Q-function. To address this, two seemingly distinct families of methods have been proposed for diffusion policies: a noise-expectation family, which utilizes a weighted average of noise as the training target, and a gradient-expectation family, which employs a weighted average of Q-function gradients. Yet, it remains unclear how these objectives relate formally or if they can be synthesized into a more general formulation. In this paper, we propose a unified framework, reverse flow matching (RFM), which rigorously addresses the problem of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Model Reduction and Neural Networks