The Impact of AI-Generated Text on the Internet
Jonas Dolezal, Sawood Alam, Mark Graham, Maty Bohacek

TL;DR
This study estimates that by mid-2025, about 35% of new websites are AI-generated or assisted, revealing correlations with semantic diversity and sentiment, but not with factual accuracy or stylistic diversity, contrasting public perception.
Contribution
It provides the first large-scale estimate of AI-generated content on the internet and examines its perceived and actual impacts on diversity and accuracy.
Findings
35% of new websites are AI-generated by mid-2025
AI-generated content correlates with lower semantic diversity
Public perception overestimates negative impacts of AI text
Abstract
The proliferation of AI-generated and AI-assisted text on the internet is feared to contribute to a degradation in semantic and stylistic diversity, factual accuracy, and other negative developments (sometimes subsumed under the Dead Internet Theory). What has hindered answering these questions is that it has not been understood just how much of the internet is actually AI-generated or AI-edited. To this end, we construct a representative sample of websites published on the internet between 2022 and 2025 using the Internet Archive, and apply a state-of-the-art AI text detector on them. We find that by mid-2025, roughly 35% of newly published websites were classified as AI-generated or AI-assisted, up from zero before ChatGPT's launch in late 2022. We also find statistically significant evidence for some of the identified hypotheses; for example, that increases in AI-generated text on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
