Preprint: Poster: Did I Just Browse A Website Written by LLMs?

Sichang Steven He; Ramesh Govindan; Harsha V. Madhyastha

arXiv:2507.13933·cs.NI·October 13, 2025

Preprint: Poster: Did I Just Browse A Website Written by LLMs?

Sichang Steven He, Ramesh Govindan, Harsha V. Madhyastha

PDF

Open Access

TL;DR

This paper introduces a scalable, highly accurate method for detecting websites generated by large language models, revealing their increasing prevalence and potential impact on the web ecosystem.

Contribution

It proposes a novel website-level classification pipeline that outperforms existing detectors on complex web content and provides the first large-scale analysis of LLM-dominant sites.

Findings

01

Achieved 100% accuracy on ground truth datasets.

02

Detected a significant portion of LLM-dominant sites in web archives.

03

Found increasing prevalence of LLM-generated content in search results.

Abstract

Increasingly, web content is automatically generated by large language models (LLMs) with little human input. We call this "LLM-dominant" content. Since LLMs plagiarize and hallucinate, LLM-dominant content can be unreliable and unethical. Yet, websites rarely disclose such content, and human readers struggle to distinguish it. Thus, we must develop reliable detectors for LLM-dominant content. However, state-of-the-art LLM detectors are inaccurate on web content, because web content has low positive rates, complex markup, and diverse genres, instead of clean, prose-like benchmark data SoTA detectors are optimized for. We propose a highly reliable, scalable pipeline that classifies entire websites. Instead of naively classifying text extracted from each page, we classify each site based on an LLM text detector's outputs of multiple prose-like pages to boost accuracies. We train and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAcademic Publishing and Open Access · Authorship Attribution and Profiling · Academic integrity and plagiarism