High-Order Matching for One-Step Shortcut Diffusion Models

Bo Chen; Chengyue Gong; Xiaoyu Li; Yingyu Liang; Zhizhou Sha; Zhenmei; Shi; Zhao Song; Mingda Wan

arXiv:2502.00688·cs.CV·February 4, 2025

High-Order Matching for One-Step Shortcut Diffusion Models

Bo Chen, Chengyue Gong, Xiaoyu Li, Yingyu Liang, Zhizhou Sha, Zhenmei, Shi, Zhao Song, Mingda Wan

PDF

Open Access 3 Reviews

TL;DR

HOMO introduces high-order supervision into one-step diffusion models, significantly improving trajectory smoothness, stability, and geometric accuracy over existing first-order methods, especially in complex, high-curvature regions.

Contribution

The paper presents HOMO, a novel high-order supervision framework that enhances one-step diffusion models by incorporating acceleration and higher derivatives, addressing limitations of prior first-order approaches.

Findings

01

HOMO achieves smoother and more stable trajectories.

02

HOMO outperforms first-order models in high-curvature regions.

03

Theoretically, HOMO provides superior approximation accuracy.

Abstract

One-step shortcut diffusion models [Frans, Hafner, Levine and Abbeel, ICLR 2025] have shown potential in vision generation, but their reliance on first-order trajectory supervision is fundamentally limited. The Shortcut model's simplistic velocity-only approach fails to capture intrinsic manifold geometry, leading to erratic trajectories, poor geometric alignment, and instability-especially in high-curvature regions. These shortcomings stem from its inability to model mid-horizon dependencies or complex distributional features, leaving it ill-equipped for robust generative modeling. In this work, we introduce HOMO (High-Order Matching for One-Step Shortcut Diffusion), a game-changing framework that leverages high-order supervision to revolutionize distribution transportation. By incorporating acceleration, jerk, and beyond, HOMO not only fixes the flaws of the Shortcut model but also…

Peer Reviews

Decision·ICLR 2026 Conference Withdrawn Submission

Reviewer 01Rating 2Confidence 3

Strengths

The core idea of explicitly modeling higher-order terms along the transport path with separate networks is simple and potentially broadly applicable. The paper gives concrete training and sampling procedures that are easy to implement, and on standard 2D benchmarks it shows consistent improvements over a first-order shortcut baseline.

Weaknesses

The writing quality materially hurts readability. For example, the text says "we define Shortcut model compute next field" which is ungrammatical and obscures meaning. On the theory side, the approximation bounds are not informative for learning: even with large models, the bound in 5.1 retains an additive term $\mathbb{E} \left[\|\dot x_{\text{true}}-\ddot{x}_{\text{true}}\|^2\right]$ that does not vanish. The results do not show that the learned velocity and acceleration converge to the truth

Reviewer 02Rating 2Confidence 4

Strengths

The main strength is that the paper has a good idea, namely to use second order taylor approximation in the setting of [Frans et al. 2025], however I think that this paper is not in final form.

Weaknesses

The main weakness is the experiments. If the claim is that second or higher order is better than the original, then it should be tested on the same datasets and prove stronger performance there. Testing on 2D distributions does not convince very much, because we all know that higher dimensional geometry has sometimes counterintuitive properties that are not easily captured by just testing on distributions (as complex as they may be called) in R^2. The proofs of the main theorems (appendices.

Reviewer 03Rating 2Confidence 3

Strengths

* The main idea behind the paper is well-founded and motivated, as it makes a lot of sense to expect that further guidance from higher order terms should help performance. * The theoretical justification for the use and importance of higher order terms in HOMO is, as far as I can see, good. * The paper does include a lot of empirical results, from providing many ablations with different combinations of M1, M2, SC.

Weaknesses

* The main issue with the paper is that all reported results are on 2D toy experiments. For instance, there are no large scale evaluations on problems of actual interest such as CIFAR or ImageNet, as well as baselines with other relevant methods on one-step/few-step generative models, like [1, 2] etc. * The main point that the paper needs to make is that the computational trade-off of additional computation from handling higher-order term, pays off in terms of performance. From Appendix H, ther

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference