Detecting Reference Errors in Scientific Literature with Large Language Models

Tianmai M. Zhang; Neil F. Abernethy

arXiv:2411.06101·cs.CL·April 3, 2026·2 cites

Detecting Reference Errors in Scientific Literature with Large Language Models

Tianmai M. Zhang, Neil F. Abernethy

PDF

TL;DR

This paper investigates the use of GPT large language models to automatically detect citation and quotation errors in scientific literature, aiming to improve accuracy in scientific publishing.

Contribution

It demonstrates that GPT models can identify reference errors with limited context without needing fine-tuning, advancing AI-assisted scientific review.

Findings

01

GPT models detect citation errors with limited context

02

Detection achieved without model fine-tuning

03

Evaluation used an expert-annotated dataset

Abstract

Reference errors, such as citation and quotation errors, are common in scientific papers. Such errors can result in the propagation of inaccurate information, but are difficult and time-consuming to detect, posing a significant challenge to scientific publishing. To support automatic detection of reference errors, this work evaluated the ability of large language models in OpenAI's GPT family to detect quotation errors. Specifically, we prepared an expert-annotated, general-domain dataset of statement-reference pairs from journal articles. Large language models were evaluated in different settings with varying amounts of reference information provided by retrieval augmentation. Our results showed that large language models are able to detect erroneous citations with limited context and without fine-tuning. This study contributes to the growing literature that seeks to utilize artificial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.