Detecting Reference Errors in Scientific Literature with Large Language Models
Tianmai M. Zhang, Neil F. Abernethy

TL;DR
This paper investigates the use of GPT large language models to automatically detect citation and quotation errors in scientific literature, aiming to improve accuracy in scientific publishing.
Contribution
It demonstrates that GPT models can identify reference errors with limited context without needing fine-tuning, advancing AI-assisted scientific review.
Findings
GPT models detect citation errors with limited context
Detection achieved without model fine-tuning
Evaluation used an expert-annotated dataset
Abstract
Reference errors, such as citation and quotation errors, are common in scientific papers. Such errors can result in the propagation of inaccurate information, but are difficult and time-consuming to detect, posing a significant challenge to scientific publishing. To support automatic detection of reference errors, this work evaluated the ability of large language models in OpenAI's GPT family to detect quotation errors. Specifically, we prepared an expert-annotated, general-domain dataset of statement-reference pairs from journal articles. Large language models were evaluated in different settings with varying amounts of reference information provided by retrieval augmentation. Our results showed that large language models are able to detect erroneous citations with limited context and without fine-tuning. This study contributes to the growing literature that seeks to utilize artificial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
