Elucidating the Preconditioning in Consistency Distillation

Kaiwen Zheng; Guande He; Jianfei Chen; Fan Bao; Jun Zhu

arXiv:2502.02922·cs.LG·May 1, 2025

Elucidating the Preconditioning in Consistency Distillation

Kaiwen Zheng, Guande He, Jianfei Chen, Fan Bao, Jun Zhu

PDF

Open Access 3 Reviews

TL;DR

This paper provides theoretical insights into preconditioning in consistency distillation for diffusion models, proposing an analytic method to optimize it, leading to faster training and better trajectory alignment.

Contribution

It introduces a theoretical framework for preconditioning in consistency distillation and proposes Analytic-Precond, an optimization method that improves training efficiency and model performance.

Findings

01

Achieves 2x to 3x training acceleration.

02

Enhances alignment of student and teacher trajectories.

03

Facilitates learning of trajectory jumpers.

Abstract

Consistency distillation is a prevalent way for accelerating diffusion models adopted in consistency (trajectory) models, in which a student model is trained to traverse backward on the probability flow (PF) ordinary differential equation (ODE) trajectory determined by the teacher model. Preconditioning is a vital technique for stabilizing consistency distillation, by linear combining the input data and the network output with pre-defined coefficients as the consistency function. It imposes the boundary condition of consistency functions without restricting the form and expressiveness of the neural network. However, previous preconditionings are hand-crafted and may be suboptimal choices. In this work, we offer the first theoretical insights into the preconditioning in consistency distillation, by elucidating its design criteria and the connection to the teacher ODE trajectory. Based on…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. Complete proofs are included for each proposition in the manuscript. 2. Extensive numerical experiments are provided to validate the effectiveness of the proposed methodology.

Weaknesses

Presentation of the manuscript can be further improved by rewriting certain phrases and expanding on some technical details. For instance, the phrase "CMs aim to a consistency function" on line 134 might be better rephrased as "CMs aim to learn a consistency function". For possible ways of explaining technical details in a better way, one may refer to the "Questions" section below.

Reviewer 02Rating 6Confidence 3

Strengths

**Theoretical Innovation in Preconditioning**: The paper introduces "Analytic-Precond," a novel, analytically derived preconditioning method that theoretically optimizes the consistency distillation process. This goes beyond prior handcrafted preconditionings, offering a principled approach that minimizes the consistency gap between the teacher and student models. This theoretical grounding not only strengthens the methodology but also provides new insights into consistency distillation. **Sign

Weaknesses

- This paper does not provide whether BCM is better than CTM+ Analytic-Precond in terms of FID. - Analytic-Precond does not perform better when GAN is incorporated into the CTM. Can the authors provide an explanation or intuition for this?

Reviewer 03Rating 6Confidence 2

Strengths

The paper presents strong mathematical arguments to support the choice of coefficients, including an explanation for the CMT choices that is not just based on intuition as previous methods. - The paper shows numerical proofs of the claims made, underlying when _Analytic-Precond_ offers no advantage (single step) and when it does (two or more steps).

Weaknesses

- The paper might be a little hard to read for who is not familiar with distillation. I personally took a while to grasp the setting and all the notation. For example, $\phi$ is used many times before definition. It could be worth having a brief discussion about some nomenclature like _teacher_ & _student_.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProcess Optimization and Integration

MethodsDiffusion