FlowQ: Energy-Guided Flow Policies for Offline Reinforcement Learning

Marvin Alles; Nutan Chen; Patrick van der Smagt; Botond Cseke

arXiv:2505.14139·cs.LG·May 21, 2025

FlowQ: Energy-Guided Flow Policies for Offline Reinforcement Learning

Marvin Alles, Nutan Chen, Patrick van der Smagt, Botond Cseke

PDF

Open Access

TL;DR

FlowQ introduces an energy-guided flow matching method for offline reinforcement learning that improves training efficiency and performance without requiring guidance during inference.

Contribution

It presents a novel energy-guided flow matching approach that enhances flow model training and introduces FlowQ, an offline RL algorithm with constant training time regardless of flow sampling steps.

Findings

01

Achieves competitive performance in offline RL tasks.

02

Training time remains constant regardless of flow sampling steps.

03

Effectively models multi-modal action distributions.

Abstract

The use of guidance to steer sampling toward desired outcomes has been widely explored within diffusion models, especially in applications such as image and trajectory generation. However, incorporating guidance during training remains relatively underexplored. In this work, we introduce energy-guided flow matching, a novel approach that enhances the training of flow models and eliminates the need for guidance at inference time. We learn a conditional velocity field corresponding to the flow policy by approximating an energy-guided probability path as a Gaussian path. Learning guided trajectories is appealing for tasks where the target distribution is defined by a combination of data and an energy function, as in reinforcement learning. Diffusion-based policies have recently attracted attention for their expressive power and ability to capture multi-modal action distributions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Reinforcement Learning in Robotics · Human Pose and Action Recognition

MethodsSoftmax · Attention Is All You Need · Diffusion