Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling

Kyungmin Lee; Sihyun Yu; Jinwoo Shin

arXiv:2510.24474·cs.CV·October 29, 2025

Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling

Kyungmin Lee, Sihyun Yu, Jinwoo Shin

PDF

3 Reviews

TL;DR

Decoupled MeanFlow introduces a decoding strategy that transforms pretrained flow models into flow map models, enabling high-quality image generation with significantly fewer steps and faster inference without architectural changes.

Contribution

It presents a simple, effective method to convert pretrained flow models into flow maps, enhancing sampling speed and efficiency in generative modeling.

Findings

01

Achieves 1-step FID of 2.16 on ImageNet 256x256

02

Attains 4-step FID of 1.68, close to flow models' performance

03

Over 100x faster inference compared to traditional flow models

Abstract

Denoising generative models, such as diffusion and flow-based models, produce high-quality samples but require many denoising steps due to discretization error. Flow maps, which estimate the average velocity between timesteps, mitigate this error and enable faster sampling. However, their training typically demands architectural changes that limit compatibility with pretrained flow models. We introduce Decoupled MeanFlow, a simple decoding strategy that converts flow models into flow map models without architectural modifications. Our method conditions the final blocks of diffusion transformers on the subsequent timestep, allowing pretrained flow models to be directly repurposed as flow maps. Combined with enhanced training techniques, this design enables high-quality generation in as few as 1 to 4 steps. Notably, we find that training flow models and subsequently converting them is…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 5

Strengths

- The central contribution of this work is the simple time conditioning modification. This leads to notable performance gains as shown in the ablation studies in Table 1 and 2, validating the effectiveness of the decoupled design. - Overall, a 1-step FID of 2.16 on ImageNet 256x256 is an impressive state-of-the-art result.

Weaknesses

- This work is motivated by the encoder-decoder design. Yet the experiments does not provide direct evidence to prove that encoder-decoder is the key to high quality. For example, there are many other ways to condition the network without modifying its architecture, like interleaving $t$ and $r$ for the DiT blocks. Discussing alternative design choices in the ablation could strengthen the argument. - The authors claim that an existing SiT can be converted into a flow map without finetuning, yet

Reviewer 02Rating 6Confidence 3

Strengths

- The proposed DMF model consistently outperforms the MeanFlow baseline across all evaluated datasets and variants. Notably, it achieves high-quality 1-step ImageNet generation which highlights its efficiency and strong generative capacity. - The model can be trained from scratch, yet it also seamlessly integrates with existing pretrained models without requiring any architectural modifications while yielding improved results. - The analysis of the encoder–decoder decomposition is interesting.

Weaknesses

- While the separation of encoder and decoder components is appealing, it also seems natural to consider joint conditioning mechanisms that integrate information from both the current and target timesteps in some blocks, potentially via lightweight modifications such as joint AdaLN conditioning or LoRA adapters. Have the authors explored such hybrid alternatives? - Given that MeanFlow already incorporates both timestep conditionings, one might expect the model to implicitly learn to attenuate or

Reviewer 03Rating 6Confidence 5

Strengths

1. **Training-free flow map transformation** The proposed decoupled architecture allows pretrained flow models to be directly repurposed as flow maps without additional fine-tuning, which is both conceptually elegant and practically impactful. This demonstrates a viable paradigm for transformer-based flow map models that leverages existing large-scale flow model checkpoints, reducing training cost and broadening applicability. 2. **Broader applicability of the fine-tuning paradigm** The propo

Weaknesses

1. **Stability of the JVP term** The proposed method does not directly address the well-known stability issue of the JVP term. This instability has been repeatedly identified as the primary bottleneck in scaling consistency-based methods to large-scale applications such as text-to-image or text-to-video generation (Lu & Song, 2024; Chen et al., 2025; Zheng et al., 2025). Therefore, while the techniques presented in the paper for improving MeanFlow training remain valuable, the overall scope of

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.