LLM hallucinations in the wild: Large-scale evidence from non-existent citations
Zhenyue Zhao, Yihe Wang, Toby Stuart, Mathijs De Vaan, Paul Ginsparg, Yian Yin

TL;DR
This study provides large-scale evidence that LLMs increasingly generate false scientific citations, which threaten the reliability and equity of scientific knowledge as their adoption grows.
Contribution
It offers the first large-scale, empirical analysis of LLM hallucinations in scientific citations across millions of papers, revealing their prevalence and societal implications.
Findings
Sharp rise in non-existent references post-LLM adoption
Hallucinated citations disproportionately affect certain fields and demographics
Current moderation processes only partially mitigate these errors
Abstract
Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a uniquely verifiable object - scientific citations - to audit 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central. We find a sharp rise in non-existent references following widespread LLM adoption, with a conservative estimate of 146,932 hallucinated citations in 2025 alone. These errors are diffusely embedded across many papers but especially pronounced in fields with rapid AI uptake, in manuscripts with linguistic signatures of AI-assisted writing, and among small and early-career author teams. At the same time, hallucinated references disproportionately assign credit to already prominent and male scholars,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
