STONYBOOK: A System and Resource for Large-Scale Analysis of Novels

Charuta Pethe; Allen Kim; Rajesh Prabhakar; Tanzir Pial; Steven Skiena

arXiv:2311.03614·cs.CL·November 8, 2023·1 cites

STONYBOOK: A System and Resource for Large-Scale Analysis of Novels

Charuta Pethe, Allen Kim, Rajesh Prabhakar, Tanzir Pial, Steven Skiena

PDF

Open Access

TL;DR

This paper introduces STONYBOOK, a comprehensive system and resource for large-scale analysis of novels, including an annotation pipeline, a large collection of annotated novels, and tools for aggregate literary analysis.

Contribution

It provides an open-source NLP pipeline, a large annotated novel dataset, and a web interface for large-scale literary analysis, enabling new research opportunities.

Findings

01

Visualizations of character interactions and occurrences

02

Analysis of vocabulary and readability metrics

03

Comparison of similar books based on annotations

Abstract

Books have historically been the primary mechanism through which narratives are transmitted. We have developed a collection of resources for the large-scale analysis of novels, including: (1) an open source end-to-end NLP analysis pipeline for the annotation of novels into a standard XML format, (2) a collection of 49,207 distinct cleaned and annotated novels, and (3) a database with an associated web interface for the large-scale aggregate analysis of these literary works. We describe the major functionalities provided in the annotation system along with their utilities. We present samples of analysis artifacts from our website, such as visualizations of character occurrences and interactions, similar books, representative vocabulary, part of speech statistics, and readability metrics. We also describe the use of the annotated format in qualitative and quantitative analysis across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling