When AI Eats Itself: On the Caveats of AI Autophagy

Xiaodan Xing; Fadong Shi; Jiahao Huang; Yinzhe Wu; Yang Nan; Sheng; Zhang; Yingying Fang; Mike Roberts; Carola-Bibiane Sch\"onlieb; Javier Del; Ser; and Guang Yang

arXiv:2405.09597·cs.LG·November 11, 2024·3 cites

When AI Eats Itself: On the Caveats of AI Autophagy

Xiaodan Xing, Fadong Shi, Jiahao Huang, Yinzhe Wu, Yang Nan, Sheng, Zhang, Yingying Fang, Mike Roberts, Carola-Bibiane Sch\"onlieb, Javier Del, Ser, and Guang Yang

PDF

Open Access

TL;DR

This paper explores the phenomenon of AI autophagy, where generative AI systems consume their own outputs, raising concerns about data contamination, model performance, and ethical implications, and discusses strategies for sustainable development.

Contribution

It provides a comprehensive analysis of AI autophagy, highlighting its risks and proposing mitigation strategies for sustainable generative AI development.

Findings

01

Uncontrolled dissemination of synthetic data contaminates datasets.

02

AI autophagy may degrade model performance and reliability.

03

Strategies are needed to balance real and synthetic data use.

Abstract

Generative Artificial Intelligence (AI) technologies and large models are producing realistic outputs across various domains, such as images, text, speech, and music. Creating these advanced generative models requires significant resources, particularly large and high-quality datasets. To minimise training expenses, many algorithm developers use data created by the models themselves as a cost-effective training solution. However, not all synthetic data effectively improve model performance, necessitating a strategic balance in the use of real versus synthetic data to optimise outcomes. Currently, the previously well-controlled integration of real and synthetic data is becoming uncontrollable. The widespread and unregulated dissemination of synthetic data online leads to the contamination of datasets traditionally compiled through web scraping, now mixed with unlabeled synthetic data.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI