Mapping Mutable Genres in Structurally Complex Volumes

Ted Underwood; Michael L. Black; Loretta Auvil; Boris Capitanu

arXiv:1309.3323·cs.CL·November 17, 2016

Mapping Mutable Genres in Structurally Complex Volumes

Ted Underwood, Michael L. Black, Loretta Auvil, Boris Capitanu

PDF

TL;DR

This paper presents a multi-layered classification approach using hidden Markov models and ensemble classifiers to segment and classify large, heterogeneous digital library volumes by genre, accounting for historical changes.

Contribution

It introduces a novel method combining segmentation and ensemble classification to handle genre mapping in large, evolving digital volumes, addressing scale and heterogeneity challenges.

Findings

01

Successfully classified 469,200 volumes from HathiTrust

02

Extracted and analyzed 32,209 fiction volumes for narrative perspective trends

03

Identified genre-specific associations with narrative points of view

Abstract

To mine large digital libraries in humanistically meaningful ways, scholars need to divide them by genre. This is a task that classification algorithms are well suited to assist, but they need adjustment to address the specific challenges of this domain. Digital libraries pose two problems of scale not usually found in the article datasets used to test these algorithms. 1) Because libraries span several centuries, the genres being identified may change gradually across the time axis. 2) Because volumes are much longer than articles, they tend to be internally heterogeneous, and the classification task needs to begin with segmentation. We describe a multi-layered solution that trains hidden Markov models to segment volumes, and uses ensembles of overlapping classifiers to address historical change. We test this approach on a collection of 469,200 volumes drawn from HathiTrust Digital…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.