Hallucinations in Bibliographic Recommendation: Citation Frequency as a Proxy for Training Data Redundancy
Junichiro Niimi

TL;DR
This study investigates how citation frequency influences hallucination rates in bibliographic recommendations generated by LLMs, revealing that highly cited papers are more accurately reproduced and tend to be memorized beyond a certain citation threshold.
Contribution
It introduces citation count as a proxy for training data redundancy and empirically analyzes its impact on hallucination rates in LLM-generated bibliographic data.
Findings
Hallucination rates vary across research domains.
Citation count correlates strongly with factual accuracy.
Memorization occurs beyond approximately 1,000 citations.
Abstract
Large language models (LLMs) have been increasingly applied to a wide range of tasks, from natural language understanding to code generation. While they have also been used to assist in bibliographic recommendation, the hallucination of non-existent papers remains a major issue. Building on prior studies, this study hypothesizes that an LLM's ability to correctly produce bibliographic information depends on whether the underlying knowledge is generated or memorized, with highly cited papers (i.e., more frequently appear in the training corpus) showing lower hallucination rates. We therefore assume citation count as a proxy for training data redundancy (i.e., the frequency with which a given bibliographic record is repeatedly represented in the pretraining corpus) and investigate how citation frequency affects hallucinated references in LLM outputs. Using GPT-4.1, we generated and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
