On the Memorization of Consistency Distillation for Diffusion Models
Bingqing Jiang, Difan Zou

TL;DR
This paper investigates how consistency distillation affects memorization in diffusion models, showing it reduces memorization while maintaining or improving sample quality, supported by empirical and theoretical analysis.
Contribution
It provides the first analysis of how consistency distillation reshapes memorization in diffusion models, combining empirical results with a theoretical framework.
Findings
Consistency distillation reduces memorization transferred from a memorized teacher model.
Distillation preserves or enhances sample quality despite reducing memorization.
Theoretical analysis shows distillation suppresses unstable, memorization-related feature directions.
Abstract
Diffusion models are central to modern generative modeling, and understanding how they balance memorization and generalization is critical for reliable deployment. Recent work has shown that memorization in diffusion models is shaped by training dynamics, with generalization and memorization emerging at different stages of training. However, deployed diffusion models are often further distilled, introducing an additional training phase whose impact on memorization is not well understood. In this work, we analyze how distillation reshapes memorization behavior in diffusion models, taking consistency distillation as a representative framework. Empirically, we show that when applied to a teacher model that has memorized data, consistency distillation significantly reduces transferred memorization in the student while preserving, and sometimes improving, sample quality. To explain this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
