Flow Q-Learning

Seohong Park; Qiyang Li; Sergey Levine

arXiv:2502.02538·cs.LG·May 27, 2025

Flow Q-Learning

Seohong Park, Qiyang Li, Sergey Levine

PDF

Open Access 2 Repos

TL;DR

Flow Q-Learning (FQL) introduces a flow-matching policy for offline RL that models complex action distributions efficiently, avoiding recursive backpropagation and iterative action generation, leading to strong performance on diverse tasks.

Contribution

FQL proposes a novel approach that trains an expressive one-step policy with RL to improve offline RL performance without recursive backpropagation.

Findings

01

Achieves strong results on 73 challenging offline RL tasks.

02

Effectively models complex action distributions with flow-matching.

03

Avoids unstable recursive backpropagation during training.

Abstract

We present flow Q-learning (FQL), a simple and performant offline reinforcement learning (RL) method that leverages an expressive flow-matching policy to model arbitrarily complex action distributions in data. Training a flow policy with RL is a tricky problem, due to the iterative nature of the action generation process. We address this challenge by training an expressive one-step policy with RL, rather than directly guiding an iterative flow policy to maximize values. This way, we can completely avoid unstable recursive backpropagation, eliminate costly iterative action generation at test time, yet still mostly maintain expressivity. We experimentally show that FQL leads to strong performance across 73 challenging state- and pixel-based OGBench and D4RL tasks in offline RL and offline-to-online RL. Project page: https://seohong.me/projects/fql/

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence · Data Stream Mining Techniques

MethodsTRON Customer Service Number +1-833-534-1729 · Q-Learning