STORK: Faster Diffusion And Flow Matching Sampling By Resolving Both Stiffness And Structure-Dependence

Zheng Tan; Weizhen Wang; Andrea L. Bertozzi; Ernest K. Ryu

arXiv:2505.24210·cs.CV·October 2, 2025

STORK: Faster Diffusion And Flow Matching Sampling By Resolving Both Stiffness And Structure-Dependence

Zheng Tan, Weizhen Wang, Andrea L. Bertozzi, Ernest K. Ryu

PDF

Open Access 1 Repo 3 Reviews

TL;DR

STORK is a novel sampling method that accelerates diffusion and flow-matching models by addressing stiffness and structure dependence, leading to faster inference without sacrificing quality.

Contribution

Introduces STORK, a new sampling technique that improves speed and quality for diffusion and flow-matching models by resolving key numerical challenges.

Findings

01

Enhanced sampling speed with fewer function evaluations.

02

Improved image and video generation quality.

03

Applicable to both diffusion and flow-matching models.

Abstract

Diffusion models (DMs) and flow-matching models have demonstrated remarkable performance in image and video generation. However, such models require a significant number of function evaluations (NFEs) during sampling, leading to costly inference. Consequently, quality-preserving fast sampling methods that require fewer NFEs have been an active area of research. However, prior training-free sampling methods fail to simultaneously address two key challenges: the stiffness of the ODE (i.e., the non-straightness of the velocity field) and dependence on the semi-linear structure of the DM ODE (which limits their direct applicability to flow-matching models). In this work, we introduce the Stabilized Taylor Orthogonal Runge--Kutta (STORK) method, addressing both design concerns. We demonstrate that STORK consistently improves the quality of diffusion and flow-matching sampling for image and…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 4

Strengths

1. Clear motivation and message: the authors clearly present what problem they aim to solve and how they approach it. 2. Conceptually sound method: Leveraging SRK to handle stiffness, and using Taylor-based virtual NFEs to reduce computational cost, finally making a viable and efficient sampler for flow-matching models 3. Comprehensive empirical evaluation: covering unconditional/conditional image generation and text-to-video tasks, with consistently strong results. 4. Well-written manuscript: t

Weaknesses

1. Runtime analysis: authors are only reporting NFE here, but for a more thorough analysis/comparison, they need to report wall clock time and GPU(VRAM) usage. So that readers could better understand in detail, e.g., how much of that time it takes to virtual NFE calculation? 2. On Table 1, - Clarification on NFE report: since they're comparing samplers, with different count evaluations such as inner NFEs for higher-order/intermediate calculation, or virtual NFEs, and so on... Table 1 is a bi

Reviewer 02Rating 4Confidence 4

Strengths

- The results presented in this paper are solid. Quantitative evaluations demonstrate a clear advantage of STORK over UniPC and DPM-Solver in few-step settings. Qualitative results also look promising: STORK seems to generate more details than other solvers under few-step settings, especially on video generation. - Despite the dense math, the presentation quality of this paper is high, making it easily readable. Notably, this paper makes a connection with the notion of stiffness in classical num

Weaknesses

- As mentioned in L300, it is stated that "naive application of SRK4 to the CIFAR-10 dataset results in very poor sampling results", so the authors propose to plug Taylor approximation to SRK4. This is an interesting observation, but the reason is not analyzed in depth. If SRK is considered a common method for solving stiff ODEs, why would it not work in this case? Using Taylor expansion and Adams-Bashforth approximation is common in previous flow ODE solvers, so plugging this into SRK seems to

Reviewer 03Rating 2Confidence 4

Strengths

1. The paper introduces a novel method for faster sampling from diffusion models. 2. The performance of STORK-4 shows improvements when NFE is small.

Weaknesses

1. The paper motivates the proposed method by claiming diffusion models exhibit stiff dynamics, but this is not properly justified empirically or theoretically. There is insufficient evidence presented that diffusion models are actually stiff, making the motivation unclear. Table 1 shows that the STORK-4 is significantly better than SRK4. More intriguingly, even with 50 NFE of SRK4, its performance is around 6.167 for FID score which is lower than its STORK-4 at 10 NFE. This raises the question

Code & Models

Repositories

zt220501/stork
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging Techniques and Applications · Advanced MRI Techniques and Applications · Model Reduction and Neural Networks

MethodsDiffusion