BookWorm: A Dataset for Character Description and Analysis
Argyrios Papoudakis, Mirella Lapata, Frank Keller

TL;DR
This paper introduces BookWorm, a new dataset for character description and analysis in books, evaluating models that process long narratives to generate factual profiles and interpretive insights.
Contribution
The paper presents the BookWorm dataset and benchmarks state-of-the-art models for character understanding in full-length books, highlighting retrieval-based methods as most effective.
Findings
Retrieval-based approaches outperform hierarchical models.
Fine-tuned models with coreference retrieval produce more factual descriptions.
Models show promising results in zero-shot and fine-tuning settings.
Abstract
Characters are at the heart of every story, driving the plot and engaging readers. In this study, we explore the understanding of characters in full-length books, which contain complex narratives and numerous interacting characters. We define two tasks: character description, which generates a brief factual profile, and character analysis, which offers an in-depth interpretation, including character development, personality, and social context. We introduce the BookWorm dataset, pairing books from the Gutenberg Project with human-written descriptions and analyses. Using this dataset, we evaluate state-of-the-art long-context models in zero-shot and fine-tuning settings, utilizing both retrieval-based and hierarchical processing for book-length inputs. Our findings show that retrieval-based approaches outperform hierarchical ones in both tasks. Additionally, fine-tuned models using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Lexicography and Language Studies · Mathematics, Computing, and Information Processing
