The Economics of Model Collapse: Equilibrium, Welfare, and Optimal Provenance Subsidies in Synthetic Data Markets
Gustav Olaf Yunus Laitinen-Fredriksson Lundstr\"om-Imanov

TL;DR
This paper develops a microeconomic theory of synthetic data markets under model collapse, introducing equilibrium concepts, welfare analysis, and algorithms to optimize provenance subsidies and watermarking strategies.
Contribution
It introduces the Synthetic Data Contamination Equilibrium (SDCE), proves its properties, and provides practical algorithms and bounds for managing model collapse in synthetic data markets.
Findings
Welfare-maximizing provenance subsidy s* = KL(q||p)/(2 kappa)
Watermark strength w* = (1 - psi) KL(q||p)/(2 kappa psi)
Scaling experiments recover a logarithmic collapse law with high R^2
Abstract
Generative artificial intelligence is rapidly transforming the supply side of training data: an increasing share of new tokens, images, and structured records is produced by previous-generation models rather than by human originators. Recursive training on such synthetic content induces a measurable and often irreversible loss of distributional fidelity, a phenomenon known as model collapse. We develop the first unified microeconomic theory of synthetic data markets under model collapse. We introduce the Synthetic Data Contamination Equilibrium (SDCE), prove existence and generic uniqueness, derive a welfare decomposition W = W_prod + W_cons - L_coll - L_info, establish a Wasserstein-gradient-flow mean-field collapse limit, prove an impossibility of information-constrained implementation, and obtain closed-form expressions for the welfare-maximizing provenance subsidy s* = KL(q||p)/(2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
