Towards Theoretical Understandings of Self-Consuming Generative Models
Shi Fu, Sen Zhang, Yingjie Wang, Xinmei Tian, Dacheng Tao

TL;DR
This paper develops a theoretical framework to analyze how self-consuming training loops in generative models affect their learned data distributions, revealing phase transitions and bounds on distribution divergence.
Contribution
It introduces a rigorous theoretical analysis of self-consuming generative models, deriving bounds on distribution divergence and identifying phase transitions in training dynamics.
Findings
Total variation distance can be controlled with sufficient real data.
A phase transition occurs where divergence initially increases then decreases.
Insights into error propagation in kernel density estimation.
Abstract
This paper tackles the emerging challenge of training generative models within a self-consuming loop, wherein successive generations of models are recursively trained on mixtures of real and synthetic data from previous generations. We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models, including parametric and non-parametric models. Specifically, we derive bounds on the total variation (TV) distance between the synthetic data distributions produced by future models and the original real data distribution under various mixed training scenarios for diffusion models with a one-hidden-layer neural network score function. Our analysis demonstrates that this distance can be effectively controlled under the condition that mixed training dataset sizes or proportions of real data are large enough.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Systems and Decision Making
MethodsDiffusion · Early Stopping
