SoFlow: Solution Flow Models for One-Step Generative Modeling
Tianze Luo, Haotian Yuan, Zhuang Liu

TL;DR
SoFlow introduces a one-step generative modeling framework that improves efficiency and performance by leveraging velocity and solution function relationships, along with novel loss functions, outperforming existing models on ImageNet.
Contribution
The paper proposes Solution Flow Models (SoFlow), a novel one-step generative approach with new loss functions that enhance training efficiency and generation quality.
Findings
Achieves better FID-50K scores than MeanFlow on ImageNet 256x256.
Uses a flow matching loss for velocity estimation during training.
Employs a solution consistency loss that avoids Jacobian-vector product calculations.
Abstract
The multi-step denoising process in diffusion and Flow Matching models causes major efficiency issues, which motivates research on few-step generation. We present Solution Flow Models (SoFlow), a framework for one-step generation from scratch. By analyzing the relationship between the velocity function and the solution function of the velocity ordinary differential equation (ODE), we propose a Flow Matching loss and a solution consistency loss to train our models. The Flow Matching loss allows our models to provide estimated velocity fields for Classifier-Free Guidance (CFG) during training, which improves generation performance. Notably, our consistency loss does not require the calculation of the Jacobian-vector product (JVP), a common requirement in recent works that is not well-optimized in deep learning frameworks like PyTorch. Experimental results indicate that, when trained from…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper is well written (with slightly confusing notation). 2. The proposed method is backed by theory. 3. The proposed method achieve state-of-the-results (though comparable with baselines). 4. Proposed method suggest that in practice using JVP in [1,2] is not needed, potentially saving training cost.
1. the proposed method is of low novelty: a. The presented derivation is very close to Align Your Flow [1]. b. The proposed preconditioning (referred as "Euler parametrization" ) of the flow map $f_{\theta}(x_t,t,s) = x_t + (s-t)F_{\theta}(x_t,t,s)$ was already suggested by [1] and essentially results in making the actual network parameters $F_{\theta}$ learn the mean field as in [1]. c. Equation 13 was already used by [1], and composed with the Euler parametrization returns the m
- The ImageNet 1-NFE results of the proposed method is very strong, beating MeanFlow without using the inefficient JVP operation. This could imply a very positive message to the community: JVP may not be really necessary for strong performance in consistency training/distillation. Rather, a well-designed training objective with careful CFG handling could also achieve something very competitive. - All technical details (theories, design choices, ablation results) look very sound and the presenta
- While I'm very positive on based on the evaluation results, this manuscript does not really introduce novel fundamental approaches that clearly distinguish it from prior art. For example: - The model architecture is similar to prior CM extensions that learns mappings from arbitrary t to s (e.g., CTM, MeanFlow) - Combining FM loss with consistency loss is also seen in prior work (e.g., CTM) - The CFG handling method can be broadly seen as an online guidance distillation approach (where t
1. This paper proposes a novel method for one-step generation. This paper demonstrates how to approximate the unique solution of the ODE for the FM using sound theoretical proofs. Then, they build a learn target to train the model. In this way, one-step generation could be regarded as the solution from T to 0, thereby being directly solved via SoFlow. To the best of my knowledge, this method is the first one to achieve this and will truly benefit the development of FM. 2. The experimental resu
1. I have to say that the writing of this paper should undergo a significant revision. Firstly, in Eq. 1, there is no explanation for $\alpha^{'}_t$ and $b^{'}_t$. Eq 5 should at least be split into two equations, not integrated into one. I highly recommend that the author add the integrated formulation of a unique solution. In this way, the reader could clearly know why $f(x_t, t, t)$ contains two time-related variables, one for the start and one for the end. Meanwhile, it may be better to expa
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Domain Adaptation and Few-Shot Learning
