Topic modelling discourse dynamics in historical newspapers
Jani Marjanen, Elaine Zosa, Simon Hengchen, Lidia Pivovarova, Mikko, Tolonen

TL;DR
This study applies LDA and DTM topic models to Finnish historical newspapers from 1854-1917 to analyze discourse changes over time, offering methodological innovations for large diachronic datasets.
Contribution
It introduces a combined sampling and inference procedure for large diachronic text collections and compares two topic models for discourse analysis.
Findings
Effective application of topic models to large, imbalanced historical newspaper data
Quantification of topic prominence over time
Insights into discourse dynamics in Finnish history
Abstract
This paper addresses methodological issues in diachronic data analysis for historical research. We apply two families of topic models (LDA and DTM) on a relatively large set of historical newspapers, with the aim of capturing and understanding discourse dynamics. Our case study focuses on newspapers and periodicals published in Finland between 1854 and 1917, but our method can easily be transposed to any diachronic data. Our main contributions are a) a combined sampling, training and inference procedure for applying topic models to huge and imbalanced diachronic text collections; b) a discussion on the differences between two topic models for this type of data; c) quantifying topic prominence for a period and thus a generalization of document-wise topic assignment to a discourse level; and d) a discussion of the role of humanistic interpretation with regard to analysing discourse…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods
