TL;DR
CiteAudit introduces a benchmark and framework for verifying scientific references to detect hallucinated citations, combining multi-agent verification and a large human-validated dataset to improve accuracy.
Contribution
The paper presents a novel multi-agent verification pipeline and a large-scale dataset for detecting hallucinated citations, outperforming existing methods.
Findings
The framework achieves superior verification performance over state-of-the-art LLMs.
A large, human-validated dataset was constructed across diverse domains.
Code is publicly available at https://github.com/shiiiikw/CiteAudit.
Abstract
Scientific research relies on citation integrity, yet large language models (LLMs) have introduced a critical risk: fabricated references that appear plausible but correspond to no real publications. As manual verification becomes infeasible and existing automated tools remain fragile, we introduce CiteAudit, a comprehensive benchmark and detection framework for hallucinated citations. We design a multi-agent verification pipeline that decomposes citation checking into metadata extraction, memory lookup, web-based retrieval, and final judgment. To evaluate this, we construct a large-scale, human-validated dataset spanning diverse domains and hallucination types. Experiments demonstrate that our framework achieves superior verification performance over state-of-the-art LLMs and commercial baselines. Our work provides the necessary infrastructure to audit citations at scale and safeguard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
