TL;DR
This paper develops and annotates large English and German poetry corpora with prosodic features, training neural models to analyze rhythmic and metrical patterns in poetry at scale.
Contribution
It introduces new multilingual poetry corpora with prosodic annotations and demonstrates neural models that outperform baselines in metrical tagging tasks.
Findings
BiLSTM-CRF models outperform baselines and BERT-based approaches.
Joint prediction of poetic features improves model performance.
Caesuras are closely linked to syntax and influence line measures.
Abstract
A prerequisite for the computational study of literature is the availability of properly digitized texts, ideally with reliable meta-data and ground-truth annotation. Poetry corpora do exist for a number of languages, but larger collections lack consistency and are encoded in various standards, while annotated corpora are typically constrained to a particular genre and/or were designed for the analysis of certain linguistic features (like rhyme). In this work, we provide large poetry corpora for English and German, and annotate prosodic features in smaller corpora to train corpus driven neural models that enable robust large scale analysis. We show that BiLSTM-CRF models with syllable embeddings outperform a CRF baseline and different BERT-based approaches. In a multi-task setup, particular beneficial task relations illustrate the inter-dependence of poetic features. A model learns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsVERtex Similarity Embeddings · Conditional Random Field
