On Finding Inconsistencies in Documents

Charles J. Lovering; Seth Ebner; Brandon Smock; Michael Krumdick; Saad Rabbani; Ahmed Muhammad; Varshini Reddy; Chris Tanner

arXiv:2512.18601·cs.CL·December 23, 2025

On Finding Inconsistencies in Documents

Charles J. Lovering, Seth Ebner, Brandon Smock, Michael Krumdick, Saad Rabbani, Ahmed Muhammad, Varshini Reddy, Chris Tanner

PDF

Open Access 1 Datasets

TL;DR

This paper introduces the FIND benchmark to evaluate language models' ability to detect inconsistencies in complex documents, revealing current models' strengths and limitations in this challenging task.

Contribution

The paper presents a new benchmark, FIND, for assessing language models' effectiveness in identifying inconsistencies in long, technical documents.

Findings

01

GPT-5 detected 64% of inserted inconsistencies

02

GPT-5 identified many previously unnoticed inconsistencies in real papers

03

Inconsistency detection remains a challenging task for current models

Abstract

Professionals in academia, law, and finance audit their documents because inconsistencies can result in monetary, reputational, and scientific costs. Language models (LMs) have the potential to dramatically speed up this auditing process. To understand their abilities, we introduce a benchmark, FIND (Finding INconsistencies in Documents), where each example is a document with an inconsistency inserted manually by a domain expert. Despite the documents being long, technical, and complex, the best-performing model (gpt-5) recovered 64% of the inserted inconsistencies. Surprisingly, gpt-5 also found undiscovered inconsistencies present in the original documents. For example, on 50 arXiv papers, we judged 136 out of 196 of the model's suggestions to be legitimate inconsistencies missed by the original authors. However, despite these findings, even the best models miss almost half of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

kensho/FIND
dataset· 54 dl
54 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Computational and Text Analysis Methods