One-Step Flow Q-Learning: Addressing the Diffusion Policy Bottleneck in Offline Reinforcement Learning

Thanh Nguyen; Chang D. Yoo

arXiv:2508.13904·cs.LG·February 25, 2026

One-Step Flow Q-Learning: Addressing the Diffusion Policy Bottleneck in Offline Reinforcement Learning

Thanh Nguyen, Chang D. Yoo

PDF

Open Access 3 Reviews

TL;DR

This paper introduces OFQL, a new offline reinforcement learning method that enables one-step action generation, significantly improving speed and robustness while achieving state-of-the-art results on benchmarks.

Contribution

OFQL reformulates diffusion Q-learning within the Flow Matching framework to enable direct one-step action generation without auxiliary modules or distillation.

Findings

01

OFQL outperforms multi-step DQL in benchmark tests.

02

OFQL reduces computation during training and inference.

03

OFQL achieves state-of-the-art performance on D4RL.

Abstract

Diffusion Q-Learning (DQL) has established diffusion policies as a high-performing paradigm for offline reinforcement learning, but its reliance on multi-step denoising for action generation renders both training and inference slow and fragile. Existing efforts to accelerate DQL toward one-step denoising typically rely on auxiliary modules or policy distillation, sacrificing either simplicity or performance. It remains unclear whether a one-step policy can be trained directly without such trade-offs. To this end, we introduce One-Step Flow Q-Learning (OFQL), a novel framework that enables effective one-step action generation during both training and inference, without auxiliary modules or distillation. OFQL reformulates the DQL policy within the Flow Matching (FM) paradigm but departs from conventional FM by learning an average velocity field that directly supports accurate one-step…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

The method is simple, clear, and effective. By replacing only the diffusion policy component with the mean-flow policy, the approach achieves both higher sampling efficiency and competitive performance. The toy example nicely illustrates the advantage of reparameterizing from $v$ to $u$, providing a clearer intuition for the underlying mechanism.

Weaknesses

Given that mean-flow generative modeling has already shown strong one-step FID results on image generation tasks, it would be valuable to see this approach applied to more complex environments beyond D4RL, such as robotic control or high-dimensional decision-making settings.

Reviewer 02Rating 8Confidence 4

Strengths

* **Clear conceptual advancement:** Reformulating DQL under the flow-matching framework and introducing an average velocity field is a novel and elegant idea that directly addresses the core inefficiency of multi-step denoising. * **Simplicity and effectiveness:** Unlike prior one-step approaches that depend on auxiliary modules or policy distillation, OFQL remains conceptually clean while achieving superior results. * **Strong empirical results:** The method outperforms DQL and other diffusion-

Weaknesses

* The theoretical justification for why learning an **average velocity field** leads to better one-step performance could be elaborated further. Currently, the paper provides an intuitive explanation but lacks a deeper analytical connection to diffusion dynamics.

Reviewer 03Rating 4Confidence 4

Strengths

1. The method shows empirical advantages in policy performance, training speed and inference time. 2. The paper is easy to follow.

Weaknesses

1. The proposed method lacks novelty. The only main difference between the proposed method and DQL is replacing the diffusion loss in actor training with a MeanFlow loss. 2. The experiments are not adequate. Only results on state-based D4RL tasks are included, and no visual observation task results are reported. 3. The argument in Lines 262-264 is not clear. Flow matching cannot "in principle, enable one-step generation", as the sampling trajectory is straight only when the target distribution i

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Science and Education Research