Elucidating the Preconditioning in Consistency Distillation
Kaiwen Zheng, Guande He, Jianfei Chen, Fan Bao, Jun Zhu

TL;DR
This paper provides theoretical insights into preconditioning in consistency distillation for diffusion models, proposing an analytic method to optimize it, leading to faster training and better trajectory alignment.
Contribution
It introduces a theoretical framework for preconditioning in consistency distillation and proposes Analytic-Precond, an optimization method that improves training efficiency and model performance.
Findings
Achieves 2x to 3x training acceleration.
Enhances alignment of student and teacher trajectories.
Facilitates learning of trajectory jumpers.
Abstract
Consistency distillation is a prevalent way for accelerating diffusion models adopted in consistency (trajectory) models, in which a student model is trained to traverse backward on the probability flow (PF) ordinary differential equation (ODE) trajectory determined by the teacher model. Preconditioning is a vital technique for stabilizing consistency distillation, by linear combining the input data and the network output with pre-defined coefficients as the consistency function. It imposes the boundary condition of consistency functions without restricting the form and expressiveness of the neural network. However, previous preconditionings are hand-crafted and may be suboptimal choices. In this work, we offer the first theoretical insights into the preconditioning in consistency distillation, by elucidating its design criteria and the connection to the teacher ODE trajectory. Based on…
Peer Reviews
Decision·ICLR 2025 Poster
1. Complete proofs are included for each proposition in the manuscript. 2. Extensive numerical experiments are provided to validate the effectiveness of the proposed methodology.
Presentation of the manuscript can be further improved by rewriting certain phrases and expanding on some technical details. For instance, the phrase "CMs aim to a consistency function" on line 134 might be better rephrased as "CMs aim to learn a consistency function". For possible ways of explaining technical details in a better way, one may refer to the "Questions" section below.
**Theoretical Innovation in Preconditioning**: The paper introduces "Analytic-Precond," a novel, analytically derived preconditioning method that theoretically optimizes the consistency distillation process. This goes beyond prior handcrafted preconditionings, offering a principled approach that minimizes the consistency gap between the teacher and student models. This theoretical grounding not only strengthens the methodology but also provides new insights into consistency distillation. **Sign
- This paper does not provide whether BCM is better than CTM+ Analytic-Precond in terms of FID. - Analytic-Precond does not perform better when GAN is incorporated into the CTM. Can the authors provide an explanation or intuition for this?
The paper presents strong mathematical arguments to support the choice of coefficients, including an explanation for the CMT choices that is not just based on intuition as previous methods. - The paper shows numerical proofs of the claims made, underlying when _Analytic-Precond_ offers no advantage (single step) and when it does (two or more steps).
- The paper might be a little hard to read for who is not familiar with distillation. I personally took a while to grasp the setting and all the notation. For example, $\phi$ is used many times before definition. It could be worth having a brief discussion about some nomenclature like _teacher_ & _student_.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProcess Optimization and Integration
MethodsDiffusion
