AbsenceBench: Language Models Can't Tell What's Missing

Harvey Yiyun Fu; Aryan Shrivastava; Jared Moore; Peter West; Chenhao Tan; Ari Holtzman

arXiv:2506.11440·cs.CL·June 16, 2025

AbsenceBench: Language Models Can't Tell What's Missing

Harvey Yiyun Fu, Aryan Shrivastava, Jared Moore, Peter West, Chenhao Tan, Ari Holtzman

PDF

Open Access 1 Datasets

TL;DR

AbsenceBench evaluates large language models' ability to detect deliberately omitted information across various domains, revealing significant limitations in their capacity to identify missing content despite their strengths in recalling known information.

Contribution

This paper introduces AbsenceBench, a new benchmark specifically designed to assess LLMs' ability to detect missing information, highlighting a fundamental limitation of transformer attention mechanisms.

Findings

01

State-of-the-art models achieve only 69.6% F1-score on AbsenceBench.

02

Models struggle to attend to gaps in documents due to attention mechanism limitations.

03

Performance drops are consistent across different domains and context lengths.

Abstract

Large language models (LLMs) are increasingly capable of processing long inputs and locating specific information within them, as evidenced by their performance on the Needle in a Haystack (NIAH) test. However, while models excel at recalling surprising information, they still struggle to identify clearly omitted information. We introduce AbsenceBench to assesses LLMs' capacity to detect missing information across three domains: numerical sequences, poetry, and GitHub pull requests. AbsenceBench asks models to identify which pieces of a document were deliberately removed, given access to both the original and edited contexts. Despite the apparent straightforwardness of these tasks, our experiments reveal that even state-of-the-art models like Claude-3.7-Sonnet achieve only 69.6% F1-score with a modest average context length of 5K tokens. Our analysis suggests this poor performance stems…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

harveyfin/AbsenceBench
dataset· 123 dl
123 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling