Diachronic Topics in New High German Poetry
Thomas N. Haider

TL;DR
This paper applies Latent Dirichlet Allocation to analyze New High German poetry, revealing how topics evolve over time and aiding in authorship attribution through unsupervised topic modeling.
Contribution
It demonstrates the effectiveness of LDA in classifying poetry by time period and authorship in a large literary corpus, showcasing its utility in digital humanities research.
Findings
LDA successfully classifies poems into historical periods.
Topic distributions assist in authorship attribution.
The approach is scalable to large literary datasets.
Abstract
Statistical topic models are increasingly and popularly used by Digital Humanities scholars to perform distant reading tasks on literary data. It allows us to estimate what people talk about. Especially Latent Dirichlet Allocation (LDA) has shown its usefulness, as it is unsupervised, robust, easy to use, scalable, and it offers interpretable results. In a preliminary study, we apply LDA to a corpus of New High German poetry (textgrid, with 51k poems, 8m token), and use the distribution of topics over documents for a classification of poems into time periods and for authorship attribution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Discriminant Analysis
