KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning

Peiqi Sui; Juan Diego Rodriguez; Philippe Laban; Dean Murphy; Joseph P. Dexter; Richard Jean So; Samuel Baker; Pramit Chaudhuri

arXiv:2505.09825·cs.CL·June 4, 2025

KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning

Peiqi Sui, Juan Diego Rodriguez, Philippe Laban, Dean Murphy, Joseph P. Dexter, Richard Jean So, Samuel Baker, Pramit Chaudhuri

PDF

Open Access 1 Video

TL;DR

KRISTEVA introduces a novel benchmark for assessing large language models' interpretive reasoning in close reading, a key literary analysis skill, revealing current models' limitations compared to human experts.

Contribution

This paper presents the first large-scale close reading benchmark, KRISTEVA, with tasks designed to evaluate interpretive reasoning in literary analysis for LLMs.

Findings

01

LLMs achieve 49.7%-69.7% accuracy on close reading tasks

02

Models underperform compared to human evaluators on most tasks

03

KRISTEVA enables systematic evaluation of interpretive reasoning in LLMs

Abstract

Each year, tens of millions of essays are written and graded in college-level English courses. Students are asked to analyze literary and cultural texts through a process known as close reading, in which they gather textual details to formulate evidence-based arguments. Despite being viewed as a basis for critical thinking and widely adopted as a required element of university coursework, close reading has never been evaluated on large language models (LLMs), and multi-discipline benchmarks like MMLU do not include literature as a subject. To fill this gap, we present KRISTEVA, the first close reading benchmark for evaluating interpretive reasoning, consisting of 1331 multiple-choice questions adapted from classroom data. With KRISTEVA, we propose three progressively more difficult sets of tasks to approximate different elements of the close reading process, which we use to test how…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning· underline

Taxonomy

TopicsEducation and Critical Thinking Development