On the Stability of Iterative Retraining of Generative Models on their own Data
Quentin Bertrand, Avishek Joey Bose, Alexandre Duplessis, Marco, Jiralerspong, and Gauthier Gidel

TL;DR
This paper investigates the stability of iterative training of deep generative models on their own generated data, providing theoretical conditions and empirical validation for maintaining model quality over successive generations.
Contribution
It introduces a rigorous framework analyzing the stability of iterative retraining of generative models on mixed datasets, including synthetic data from previous models.
Findings
Iterative training remains stable if initial models are accurate and the proportion of real data is sufficiently high.
Theoretical proof of stability under specific conditions.
Empirical validation on CIFAR10 and FFHQ datasets with normalizing flows and diffusion models.
Abstract
Deep generative models have made tremendous progress in modeling complex data, often exhibiting generation quality that surpasses a typical human's ability to discern the authenticity of samples. Undeniably, a key driver of this success is enabled by the massive amounts of web-scale data consumed by these models. Due to these models' striking performance and ease of availability, the web will inevitably be increasingly populated with synthetic content. Such a fact directly implies that future iterations of generative models will be trained on both clean and artificially generated data from past models. In this paper, we develop a framework to rigorously study the impact of training generative models on mixed datasets -- from classical training on real data to self-consuming generative models trained on purely synthetic data. We first prove the stability of iterative training under the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Music and Audio Processing · Computational Physics and Python Applications
MethodsDiffusion · Normalizing Flows
