Consistency Models Made Easy

Zhengyang Geng; Ashwini Pokle; William Luo; Justin Lin; J. Zico Kolter

arXiv:2406.14548·cs.LG·October 14, 2024

Consistency Models Made Easy

Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, J. Zico Kolter

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces Easy Consistency Tuning (ECT), a method that significantly reduces training time for consistency models by fine-tuning pretrained diffusion models, achieving high-quality results efficiently.

Contribution

The authors propose ECT, a novel fine-tuning scheme that simplifies training consistency models and demonstrates improved efficiency and scalability compared to previous methods.

Findings

01

ECT achieves a 2.73 FID on CIFAR-10 in 1 hour on a single GPU.

02

ECT matches the quality of extensive training methods with much less computational cost.

03

Scaling laws for CMs under ECT follow a power law, indicating potential for larger-scale improvements.

Abstract

Consistency models (CMs) offer faster sampling than traditional diffusion models, but their training is resource-intensive. For example, as of 2024, training a state-of-the-art CM on CIFAR-10 takes one week on 8 GPUs. In this work, we propose an effective scheme for training CMs that largely improves the efficiency of building such models. Specifically, by expressing CM trajectories via a particular differential equation, we argue that diffusion models can be viewed as a special case of CMs. We can thus fine-tune a consistency model starting from a pretrained diffusion model and progressively approximate the full consistency condition to stronger degrees over the training process. Our resulting method, which we term Easy Consistency Tuning (ECT), achieves vastly reduced training times while improving upon the quality of previous methods: for example, ECT achieves a 2-step FID of 2.73 on…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

The approach is very efficent as they showed. It could be used to greatly mprove efficiency and performance of CMs at a large scale.

Weaknesses

The main motivation of the paper is straightforward. It is hard for the reader to fullly trust their obverstaion that diffusion models can be viewed as a special case of CMs in practice. The data sets and metircs on the images generation are limited. More extensive experiments or analysis should be conducted to justify their claims.

Reviewer 02Rating 8Confidence 4

Strengths

**Results.** The primary strength of this work lies in its empirical results, achieving state-of-the-art performance on standard image generation benchmarks. Overall, the experimental analysis is quite comprehensive by comparing against (and outperforming) recent and strong baseline methods in Table 1, and ablating some key design choices in the appendix. The ECT scaling laws in Section 4.2 are an interesting and underexplored direction in CMs, and the authors provide evidence of improved sample

Weaknesses

**Novelty.** Despite the strong results demonstrated in this work, I have questions about its novelty. The main methodological contribution is to initialize CM training (CT) with a pre-trained DM to enable faster convergence. In the setting of consistency distillation (CD) and DM distillation in general, it is already common practice to initialize the student from the weights of the teacher DM, effectively reducing distillation to a fine-tuning task to reduce computational requirements and facil

Reviewer 03Rating 5Confidence 4

Strengths

1. This paper explores the concept of the "curse of consistency," which presents an intriguing perspective. 2. The method proposed in this paper achieves good performance. 3. This paper discusses the “scaling laws”.

Weaknesses

1. While the "curse of consistency" is indeed fascinating, discussing only the upper bound fails to capture the true nature of errors. An increase in the upper bound does not necessarily indicate a corresponding increase in error. 2. Since the primary advantage of your method lies in its training speed, I believe you may have overlooked an important scenario: "pretraining + iCT tuning," as illustrated in Figure 2. 3. The relationship between your primary observation and your main method is not c

Code & Models

Repositories

locuslab/ect
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning

MethodsDiffusion