LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs

Tianyu Wang; Akira Horiguchi; Lingyou Pang; Carey E. Priebe

arXiv:2506.15690·cs.LG·July 25, 2025

LLM Web Dynamics: Tracing Model Collapse in a Network of LLMs

Tianyu Wang, Akira Horiguchi, Lingyou Pang, Carey E. Priebe

PDF

Open Access

TL;DR

This paper introduces LLM Web Dynamics, a framework for analyzing how large language models behave collectively on the internet, revealing convergence patterns and providing theoretical guarantees for model collapse in a network of LLMs.

Contribution

The paper presents a novel network-level framework for studying model collapse in LLMs, including a simulation of the internet and theoretical analysis using Gaussian Mixture Models.

Findings

01

Model outputs tend to converge in the simulated LLM network.

02

The framework effectively captures the dynamics of model collapse.

03

Theoretical guarantees support the observed convergence patterns.

Abstract

The increasing use of synthetic data from the public Internet has enhanced data usage efficiency in large language model (LLM) training. However, the potential threat of model collapse remains insufficiently explored. Existing studies primarily examine model collapse in a single model setting or rely solely on statistical surrogates. In this work, we introduce LLM Web Dynamics (LWD), an efficient framework for investigating model collapse at the network level. By simulating the Internet with a retrieval-augmented generation (RAG) database, we analyze the convergence pattern of model outputs. Furthermore, we provide theoretical guarantees for this convergence by drawing an analogy to interacting Gaussian Mixture Models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis