On the Memorization of Consistency Distillation for Diffusion Models

Bingqing Jiang; Difan Zou

arXiv:2604.23552·cs.LG·April 28, 2026

On the Memorization of Consistency Distillation for Diffusion Models

Bingqing Jiang, Difan Zou

PDF

TL;DR

This paper investigates how consistency distillation affects memorization in diffusion models, showing it reduces memorization while maintaining or improving sample quality, supported by empirical and theoretical analysis.

Contribution

It provides the first analysis of how consistency distillation reshapes memorization in diffusion models, combining empirical results with a theoretical framework.

Findings

01

Consistency distillation reduces memorization transferred from a memorized teacher model.

02

Distillation preserves or enhances sample quality despite reducing memorization.

03

Theoretical analysis shows distillation suppresses unstable, memorization-related feature directions.

Abstract

Diffusion models are central to modern generative modeling, and understanding how they balance memorization and generalization is critical for reliable deployment. Recent work has shown that memorization in diffusion models is shaped by training dynamics, with generalization and memorization emerging at different stages of training. However, deployed diffusion models are often further distilled, introducing an additional training phase whose impact on memorization is not well understood. In this work, we analyze how distillation reshapes memorization behavior in diffusion models, taking consistency distillation as a representative framework. Empirically, we show that when applied to a teacher model that has memorized data, consistency distillation significantly reduces transferred memorization in the student while preserving, and sometimes improving, sample quality. To explain this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.