Toward Theoretical Insights into Diffusion Trajectory Distillation via Operator Merging
Weiguo Gao, Ming Li

TL;DR
This paper offers a theoretical framework for diffusion trajectory distillation, analyzing operator merging in linear and nonlinear regimes to optimize sampling efficiency and understand approximation errors.
Contribution
It introduces a theoretical reinterpretation of trajectory distillation as operator merging, deriving optimal strategies and analyzing errors in different regimes.
Findings
Optimal merging strategy exhibits a variance-driven phase transition.
In the nonlinear regime, composite step distillation incurs unavoidable approximation error.
Theoretical analysis guides method selection for diffusion trajectory distillation.
Abstract
Diffusion trajectory distillation accelerates sampling by training a student model to approximate the multi-step denoising trajectories of a pretrained teacher model using far fewer steps. Despite strong empirical results, the trade-off between distillation strategy and generative quality remains poorly understood. We provide a theoretical characterization by reinterpreting trajectory distillation as an operator merging problem, differentiating our analysis between two distinct regimes. In the linear Gaussian regime, where approximation error is zero, we isolate optimization error, specifically signal shrinkage driven by finite training time, as the primary bottleneck. This characterization allows us to derive the theoretically optimal merging strategy, which exhibits a variance-driven phase transition and is computable via a Pareto dynamic programming algorithm. In the nonlinear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
