Quantitative Tools for Time Series Analysis in Natural Language Processing: A Practitioners Guide
W. Benedikt Schmal

TL;DR
This paper introduces quantitative time series econometric methods, including tests for non-stationarity and structural breaks, to enhance the analysis of topic evolution in natural language processing within social sciences.
Contribution
It provides a comprehensive guide with practical coding advice for applying econometric time series tools to NLP topic analysis, filling a gap in methodological rigor.
Findings
Demonstrates how to detect structural breaks in topic prevalence
Shows the importance of testing for non-stationarity in time series data
Provides a practical example using R software
Abstract
Natural language processing tools have become frequently used in social sciences such as economics, political science, and sociology. Many publications apply topic modeling to elicit latent topics in text corpora and their development over time. Here, most publications rely on visual inspections and draw inference on changes, structural breaks, and developments over time. We suggest using univariate time series econometrics to introduce more quantitative rigor that can strengthen the analyses. In particular, we discuss the econometric topics of non-stationarity as well as structural breaks. This paper serves as a comprehensive practitioners guide to provide researchers in the social and life sciences as well as the humanities with concise advice on how to implement econometric time series methods to thoroughly investigate topic prevalences over time. We provide coding advice for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting
