GhostCite: A Large-Scale Analysis of Citation Validity in the Age of Large Language Models

Zuyao Xu; Yuqi Qiu; Lu Sun; Fasheng Miao; Fubin Wu; Xiang Li; Xinyi Wang; Haozhe Lu; Zhengze Zhang; Yuxin Hu; Jialu Li; Luo Jin; Feng Zhang; Rui Luo; Xinran Liu; Yingxian Li; Jiaji Liu

arXiv:2602.06718·cs.CR·May 15, 2026

GhostCite: A Large-Scale Analysis of Citation Validity in the Age of Large Language Models

Zuyao Xu, Yuqi Qiu, Lu Sun, Fasheng Miao, Fubin Wu, Xiang Li, Xinyi Wang, Haozhe Lu, Zhengze Zhang, Yuxin Hu, Jialu Li, Luo Jin, Feng Zhang, Rui Luo, Xinran Liu, Yingxian Li, Jiaji Liu

PDF

TL;DR

This paper investigates the prevalence of fabricated citations by Large Language Models and their impact on academic trust, providing a large-scale analysis and a framework for citation verification.

Contribution

It introduces exttt{ extbackslash citeb}, a framework for large-scale citation verification, and presents a comprehensive study on citation validity in the era of LLMs.

Findings

01

All tested LLMs hallucinate citations at rates from 14.23% to 94.93%.

02

1.07% of analyzed papers contain invalid citations, increasing by 80.9% in 2025.

03

Most researchers and reviewers do not thoroughly verify citations, risking academic integrity.

Abstract

Citations provide the basis for trusting scientific claims; when they are invalid or fabricated, this trust collapses. With the advent of Large Language Models (LLMs), this risk has intensified: LLMs are increasingly used for academic writing, but their tendency to fabricate citations (``ghost citations'') poses a systemic threat to citation validity. To quantify this threat, we develop \citeb, an open-source framework for large-scale citation verification, and conduct a comprehensive study of citation validity in the LLM era through three complementary experiments. First, we benchmark 13 LLMs on citation generation task in various research domains, finding that all models hallucinate citations at rate from 14.23\% to 94.93\%. Second, we analyze 2.2 million citations from 56,381 papers at AI/ML and Security venues (2020--2025), finding that 1.07\% of papers contain invalid citations,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.