Evaluating Computational Representations of Character: An Austen Character Similarity Benchmark
Funing Yang, Carolyn Jane Anderson

TL;DR
This paper introduces AustenAlike, a benchmark suite for evaluating computational character similarity in Jane Austen's novels, revealing current systems' strengths and limitations in capturing nuanced literary similarities.
Contribution
It presents AustenAlike as a novel benchmark for character similarity, integrating structural, social, and expert literary perspectives, and evaluates existing NLP pipelines against it.
Findings
Computational models capture broad social and narrative similarities.
Expert-defined similarities are challenging for current systems.
GPT-4 rankings provide a useful comparison baseline.
Abstract
Several systems have been developed to extract information about characters to aid computational analysis of English literature. We propose character similarity grouping as a holistic evaluation task for these pipelines. We present AustenAlike, a benchmark suite of character similarities in Jane Austen's novels. Our benchmark draws on three notions of character similarity: a structurally defined notion of similarity; a socially defined notion of similarity; and an expert defined set extracted from literary criticism. We use AustenAlike to evaluate character features extracted using two pipelines, BookNLP and FanfictionNLP. We build character representations from four kinds of features and compare them to the three AustenAlike benchmarks and to GPT-4 similarity rankings. We find that though computational representations capture some broad similarities based on shared social and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Authorship Attribution and Profiling · Data Analysis with R
MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Multi-Head Attention · Byte Pair Encoding
