On the Dangers of Bootstrapping Generation for Continual Learning and Beyond
Daniil Zverev, A. Sophia Koepke, Joao F. Henriques

TL;DR
This paper investigates the risks of using synthetic data repeatedly in continual learning, revealing that it causes bias, model collapse, and performance degradation, raising concerns about its reliability.
Contribution
It provides a statistical analysis of synthetic data's impact and empirical evidence of model collapse, highlighting limitations of current generative experience replay methods.
Findings
Synthetic data introduces bias and variance in training.
Generative models collapse under repeated synthetic data training.
State-of-the-art GER methods fail to maintain latent space alignment.
Abstract
The use of synthetically generated data for training models is becoming a common practice. While generated data can augment the training data, repeated training on synthetic data raises concerns about distribution drift and degradation of performance due to contamination of the dataset. We investigate the consequences of this bootstrapping process through the lens of continual learning, drawing a connection to Generative Experience Replay (GER) methods. We present a statistical analysis showing that synthetic data introduces significant bias and variance into training objectives, weakening the reliability of maximum likelihood estimation. We provide empirical evidence showing that popular generative models collapse under repeated training with synthetic data. We quantify this degradation and show that state-of-the-art GER methods fail to maintain alignment in the latent space. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Data Stream Mining Techniques
