Literary Evidence Retrieval via Long-Context Language Models

Katherine Thai; Mohit Iyyer

arXiv:2506.03090·cs.CL·June 4, 2025

Literary Evidence Retrieval via Long-Context Language Models

Katherine Thai, Mohit Iyyer

PDF

Open Access

TL;DR

This paper evaluates long-context language models on literary evidence retrieval, showing that recent models can outperform humans in generating missing quotations from literary texts, but still struggle with nuanced signals.

Contribution

It introduces a benchmark for literary evidence retrieval using long-context LLMs and compares open and closed models, revealing current limitations and strengths.

Findings

01

Gemini Pro 2.5 exceeds human accuracy at 62.5%

02

Open-weight models achieve only 29.1% accuracy

03

Models struggle with nuanced literary signals

Abstract

How well do modern long-context language models understand literary fiction? We explore this question via the task of literary evidence retrieval, repurposing the RELiC dataset of That et al. (2022) to construct a benchmark where the entire text of a primary source (e.g., The Great Gatsby) is provided to an LLM alongside literary criticism with a missing quotation from that work. This setting, in which the model must generate the missing quotation, mirrors the human process of literary analysis by requiring models to perform both global narrative reasoning and close textual examination. We curate a high-quality subset of 292 examples through extensive filtering and human verification. Our experiments show that recent reasoning models, such as Gemini Pro 2.5 can exceed human expert performance (62.5% vs. 50% accuracy). In contrast, the best open-weight model achieves only 29.1% accuracy,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques