LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs
Tianyu Wang, Akira Horiguchi, Lingyou Pang, Carey E. Priebe

TL;DR
This paper introduces LLM Web Dynamics, a framework for analyzing how large language models behave collectively on the internet, revealing convergence patterns and providing theoretical guarantees for model collapse in a network of LLMs.
Contribution
The paper presents a novel network-level framework for studying model collapse in LLMs, including a simulation of the internet and theoretical analysis using Gaussian Mixture Models.
Findings
Model outputs tend to converge in the simulated LLM network.
The framework effectively captures the dynamics of model collapse.
Theoretical guarantees support the observed convergence patterns.
Abstract
The increasing use of synthetic data from the public Internet has enhanced data usage efficiency in large language model (LLM) training. However, the potential threat of model collapse remains insufficiently explored. Existing studies primarily examine model collapse in a single model setting or rely solely on statistical surrogates. In this work, we introduce LLM Web Dynamics (LWD), an efficient framework for investigating model collapse at the network level. By simulating the Internet with a retrieval-augmented generation (RAG) database, we analyze the convergence pattern of model outputs. Furthermore, we provide theoretical guarantees for this convergence by drawing an analogy to interacting Gaussian Mixture Models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
