Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies
Zeyang Li, Sunbochen Tang, Navid Azizan

TL;DR
This paper introduces reverse flow matching (RFM), a unified framework that improves training of diffusion and flow policies in online reinforcement learning by reducing variance and combining Q-value and gradient information.
Contribution
The paper presents RFM, a general method unifying diffusion and flow policy training, extending Boltzmann distribution targeting, and combining Q-value and gradient data for enhanced efficiency.
Findings
RFM improves training stability and efficiency.
Enhanced performance on continuous-control benchmarks.
Unified framework encompasses existing methods as special cases.
Abstract
Diffusion and flow policies are gaining prominence in online reinforcement learning (RL) due to their expressive power, yet training them efficiently remains a critical challenge. A fundamental difficulty in online RL is the lack of direct samples from the target distribution; instead, the target is an unnormalized Boltzmann distribution defined by the Q-function. To address this, two seemingly distinct families of methods have been proposed for diffusion policies: a noise-expectation family, which utilizes a weighted average of noise as the training target, and a gradient-expectation family, which employs a weighted average of Q-function gradients. Yet, it remains unclear how these objectives relate formally or if they can be synthesized into a more general formulation. In this paper, we propose a unified framework, reverse flow matching (RFM), which rigorously addresses the problem of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Model Reduction and Neural Networks
