One Step Diffusion via Shortcut Models

Kevin Frans; Danijar Hafner; Sergey Levine; Pieter Abbeel

arXiv:2410.12557·cs.LG·June 24, 2025·2 cites

One Step Diffusion via Shortcut Models

Kevin Frans, Danijar Hafner, Sergey Levine, Pieter Abbeel

PDF

Open Access 2 Repos 3 Reviews

TL;DR

Shortcut models offer a simplified, single-network approach to fast, high-quality image generation by conditioning on step size, outperforming previous methods across various sampling budgets.

Contribution

Introduction of shortcut models that enable high-quality, flexible, and fast sampling with a single network and training phase, eliminating complex training procedures.

Findings

01

Shortcut models outperform previous methods like consistency models and reflow.

02

They produce higher quality samples across various sampling steps.

03

They enable variable step budgets at inference without retraining.

Abstract

Diffusion models and flow-matching models have enabled generating diverse and realistic images by learning to transfer noise to data. However, sampling from these models involves iterative denoising over many neural network passes, making generation slow and expensive. Previous approaches for speeding up sampling require complex training regimes, such as multiple training phases, multiple networks, or fragile scheduling. We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples in a single or multiple sampling steps. Shortcut models condition the network not only on the current noise level but also on the desired step size, allowing the model to skip ahead in the generation process. Across a wide range of sampling step budgets, shortcut models consistently produce higher quality samples than previous…

Peer Reviews

Decision·ICLR 2025 Oral

Reviewer 01Rating 8Confidence 4

Strengths

- The paper is well written and easy to follow - The proposed idea is novel and promising, this work pioneers a new family of diffusion/flow based generative models. - The method is experimentally evaluated in a controlled setting, an extensive comparison with alternative methods is present. - An open source codebase to replicate all the experiments and pretrained models will be release to falicitate further research

Weaknesses

- The performance of 1 step generated samples still lags behind multi-step methods; 1-step generated images presents some artifacts so the usability of the proposed solution in a pratical setting is limited to generate proxy images with 1 step and resample them with multiple steps - end-to-end training to reach a few sample diffusion model is an interesting solution as it removes the burden of performing two separate training phases and leads to a flexible model. However the best results in 1 s

Reviewer 02Rating 8Confidence 4

Strengths

* The paper is well written in the context of reducing sampling steps of diffusion models. * The proposed method is novel and clearly presented. * The experiments are comprehensive and convincing.

Weaknesses

* Lacks explicit comparison of training compute against other methods. * The issue with CFG could hinder future practical use.

Reviewer 03Rating 8Confidence 4

Strengths

1. _Simplicity_: The proposed method is simple, easy to understand and simplifies some challenges previously encountered in other methods to obtain one-step or few-step models. For eg. It gets rid of two-stage training required for diffusion distillation as well as complex training scheduling required for consistency models. The loss objective is intuitive and builds upon flow matching loss and introduces an additional self-consistency condition. 2. _Writing_: The paper is well-written. The cor

Weaknesses

Note: The template of this paper seems to be different and doesn’t have line numbers present in the standard template for papers under review. 1. The proposed method for shortcut sampling is general and should work for any gaussian probability paths ($x_t = \alpha_t x_0 + \sigma_t \epsilon$) however this hasn’t been explored or described in the paper. The paper considers only one choice of optimal transport probability path ($x_t = (1-t) x_0 + t \epsilon$). However, nothing constraints the met

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications · Statistical and Computational Modeling

MethodsConsistency Models