BOOKCOREF: Coreference Resolution at Book Scale

Giuliano Martinelli; Tommaso Bonomo; Pere-Llu\'is Huguet Cabot; Roberto Navigli

arXiv:2507.12075·cs.CL·July 17, 2025

BOOKCOREF: Coreference Resolution at Book Scale

Giuliano Martinelli, Tommaso Bonomo, Pere-Llu\'is Huguet Cabot, Roberto Navigli

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces BOOKCOREF, a novel large-scale benchmark for coreference resolution on full-length books, along with an automatic annotation pipeline, revealing current models' limitations at this scale.

Contribution

The paper presents the first book-scale coreference benchmark and an automatic annotation pipeline, enabling evaluation of coreference systems on texts over 200,000 tokens.

Findings

01

Models improve up to +20 CoNLL-F1 on full books.

02

Current models struggle with book-scale texts.

03

The benchmark reveals new challenges in long-document coreference.

Abstract

Coreference Resolution systems are typically evaluated on benchmarks containing small- to medium-scale documents. When it comes to evaluating long texts, however, existing benchmarks, such as LitBank, remain limited in length and do not adequately assess system capabilities at the book scale, i.e., when co-referring mentions span hundreds of thousands of tokens. To fill this gap, we first put forward a novel automatic pipeline that produces high-quality Coreference Resolution annotations on full narrative texts. Then, we adopt this pipeline to create the first book-scale coreference benchmark, BOOKCOREF, with an average document length of more than 200,000 tokens. We carry out a series of experiments showing the robustness of our automatic procedure and demonstrating the value of our resource, which enables current long-document coreference systems to gain up to +20 CoNLL-F1 points when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

sapienzanlp/bookcoref
dataset· 540 dl
540 dl

Videos

BOOKCOREF: Coreference Resolution at Book Scale· underline

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Innovative Teaching and Learning Methods