The Attribution Crisis in LLM Search Results

Ilan Strauss; Jangho Yang; Tim O'Reilly; Sruly Rosenblat; Isobel Moure

arXiv:2508.00838·cs.DL·August 5, 2025

The Attribution Crisis in LLM Search Results

Ilan Strauss, Jangho Yang, Tim O'Reilly, Sruly Rosenblat, Isobel Moure

PDF

TL;DR

This paper investigates the attribution gap in web-enabled LLMs, revealing patterns of low citation rates despite extensive web content consumption, and advocates for transparent search architectures.

Contribution

It provides empirical analysis of attribution practices in LLMs, identifying exploitation patterns and proposing standards for transparency in search and citation logging.

Findings

01

34% of Google Gemini responses lack online content fetch

02

92% of Gemini answers have no clickable citation source

03

Models vary in citation efficiency from 0.19 to 0.45

Abstract

Web-enabled LLMs frequently answer queries without crediting the web pages they consume, creating an "attribution gap" - the difference between relevant URLs read and those actually cited. Drawing on approximately 14,000 real-world LMArena conversation logs with search-enabled LLM systems, we document three exploitation patterns: 1) No Search: 34% of Google Gemini and 24% of OpenAI GPT-4o responses are generated without explicitly fetching any online content; 2) No citation: Gemini provides no clickable citation source in 92% of answers; 3) High-volume, low-credit: Perplexity's Sonar visits approximately 10 relevant pages per query but cites only three to four. A negative binomial hurdle model shows that the average query answered by Gemini or Sonar leaves about 3 relevant websites uncited, whereas GPT-4o's tiny uncited gap is best explained by its selective log disclosures rather than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.