Self-Consuming Generative Models with Adversarially Curated Data
Xiukun Wei, Xueru Zhang

TL;DR
This paper investigates how generative models evolve when retrained with synthetic data that is curated by users, including noisy or maliciously manipulated data, and proposes algorithms to exploit such adversarial curation.
Contribution
It provides a theoretical analysis of the effects of noisy and adversarial data curation on self-consuming generative models and introduces attack algorithms for adversarial scenarios.
Findings
Theoretical conditions for robustness of generative models under noisy curation.
Effective attack algorithms for adversarial data manipulation.
Experimental validation on synthetic and real datasets.
Abstract
Recent advances in generative models have made it increasingly difficult to distinguish real data from model-generated synthetic data. Using synthetic data for successive training of future model generations creates "self-consuming loops", which may lead to model collapse or training instability. Furthermore, synthetic data is often subject to human feedback and curated by users based on their preferences. Ferbach et al. (2024) recently showed that when data is curated according to user preferences, the self-consuming retraining loop drives the model to converge toward a distribution that optimizes those preferences. However, in practice, data curation is often noisy or adversarially manipulated. For example, competing platforms may recruit malicious users to adversarially curate data and disrupt rival models. In this paper, we study how generative models evolve under self-consuming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Image Processing and 3D Reconstruction · Adversarial Robustness in Machine Learning
