SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
Xingtong Ge, Xin Zhang, Tongda Xu, Yi Zhang, Xinjie Zhang, Yan Wang, Jun Zhang

TL;DR
SenseFlow introduces implicit distribution alignment and intra-segment guidance to improve the scalability and convergence of flow-based text-to-image model distillation, achieving superior performance on large models.
Contribution
The paper proposes novel methods IDA and ISG to address convergence issues in large-scale flow-based text-to-image distillation, enabling effective training of models like SD 3.5 and FLUX.
Findings
IDA enables convergence of DMD on SD 3.5
Combining IDA and ISG improves convergence on SD 3.5 and FLUX
SenseFlow outperforms previous distillation methods on large diffusion models
Abstract
The Distribution Matching Distillation (DMD) has been successfully applied to text-to-image diffusion models such as Stable Diffusion (SD) 1.5. However, vanilla DMD suffers from convergence difficulties on large-scale flow-based text-to-image models, such as SD 3.5 and FLUX. In this paper, we first analyze the issues when applying vanilla DMD on large-scale models. Then, to overcome the scalability challenge, we propose implicit distribution alignment (IDA) to constrain the divergence between the generator and the fake distribution. Furthermore, we propose intra-segment guidance (ISG) to relocate the timestep denoising importance from the teacher model. With IDA alone, DMD converges for SD 3.5; employing both IDA and ISG, DMD converges for SD 3.5 and FLUX.1 dev. Together with a scaled VFM-based discriminator, our final model, dubbed \textbf{SenseFlow}, achieves superior performance in…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper studies the problems of DMD and proposes a fix. The proposed method achieves good results for finetuning Flux and SD3.5. 2. The paper proposes good ablation experiments to help readers understand the contribution of each component. 3. The proposed method is faster compared with DMD.
1. Although I recognize IDA as a meaningful contribution, the proposed method IDA only works when the student model and fake model have the same network structure. For example, VSD (DMD) is originally designed for text-to-3D. Or more broadly speaking, the distribution matching does not require the student model to be same as the teacher model (real model here). In these general settings, the IDA does not work. 2. The images in Fig5, Ours-SD3.5 seem too bright. Same thing for Fig 10. What coul
+ IDA directly addresses the brittle inner loop in DMD via an explicit, cheap parameter interpolation; Appendix A formalizes that this bounds the generator–fake field gap. The “Training‑hours vs. FID” plot (Fig. 3) shows markedly smoother convergence with IDA on SD‑3.5 Large. + Experiments span three strong teachers (SDXL 2.6B, SD‑3.5 Large 8B, FLUX.1‑dev 12B) with contemporary baselines (LCM/PCM/Lightning/Hyper, SD‑3.5‑Turbo, FLUX‑Turbo/‑schnell). Results cover COCO‑5K, GenEval, and T2I‑CompBe
+ The paper notes that adding the VFM discriminator improves human‑preference proxies while slightly raising FID‑T (interpreted as reduced diversity). However, there is no explicit diversity study (e.g., user study, precision–recall curves over seeds). Given the important of this trade‑off in distillation, a quantitative diversity analysis would strengthen the claims. + The authors seem to omit IDA on smaller models SDXL but claims generality. A small ablation whether IDA helps small-scale mode
1. The paper is well-written and well-structured, presenting a clear and compelling motivation for the study. 2. The proposed approach is thoughtfully designed, addressing the common challenges of stability and scalability in distribution-matching distillation. 3. The experiments are comprehensive, supported by detailed analysis and clear explanations.
1. The effect of ISG is not very clear. Although Table 2 shows slight improvements in quality and human-preference metrics, no qualitative comparisons are provided for this component. Including an additional visualization—similar to Figure 3—or a reconstruction loss curve could better illustrate ISG’s impact on training stability and convergence speed. 2. The design of the VFM discriminator also requires further clarification, particularly regarding the rationale for incorporating another refere
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Steganography and Watermarking Techniques · Generative Adversarial Networks and Image Synthesis
