How LLMs Cite and Why It Matters: A Cross-Model Audit of Reference Fabrication in AI-Assisted Academic Writing and Methods to Detect Phantom Citations

MZ Naser

arXiv:2603.03299·cs.CL·March 5, 2026

How LLMs Cite and Why It Matters: A Cross-Model Audit of Reference Fabrication in AI-Assisted Academic Writing and Methods to Detect Phantom Citations

MZ Naser

PDF

Open Access

TL;DR

This study audits citation hallucination in 10 commercial LLMs across four academic domains, revealing prompt-induced hallucinations, effective detection filters, and a classifier for identifying fabricated citations, highlighting the importance of understanding and mitigating citation fabrication.

Contribution

The paper provides one of the largest audits of citation hallucination in LLMs, introduces practical detection methods, and develops a classifier to identify fabricated citations.

Findings

01

Hallucination rates vary from 11.4% to 56.8% across models and domains.

02

No spontaneous citation generation occurs without prompting.

03

Multi-model consensus and repetition improve hallucination detection accuracy.

Abstract

Large language models (LLMs) have been noted to fabricate scholarly citations, yet the scope of this behavior across providers, domains, and prompting conditions remains poorly quantified. We present one of the largest citation hallucination audits to date, in which 10 commercially deployed LLMs were prompted across four academic domains, generating 69,557 citation instances verified against three scholarly databases (namely, CrossRef, OpenAlex, and Semantic Scholar). Our results show that the observed hallucination rates span a fivefold range (between 11.4% and 56.8%) and are strongly shaped by model, domain, and prompt framing. Our results also show that no model spontaneously generates citations when unprompted, which seems to establish hallucination as prompt-induced rather than intrinsic. We identify two practical filters: 1) multi-model consensus (with more than 3 LLMs citing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Scientific Computing and Data Management