Changepoint Analysis of Topic Proportions in Temporal Text Data
Avinandan Bose, Soumendu Sundar Mukherjee

TL;DR
This paper introduces a scalable method for detecting changepoints in large-scale textual data by modeling topic proportions over time, enabling automated and interpretable identification of shifts in topic popularity.
Contribution
We develop a novel computationally efficient approach for offline changepoint detection in large textual datasets using a specialized temporal topic model and likelihood ratio testing.
Findings
Successfully identified historically significant changepoints in literature and physics datasets.
Method detects both known and novel shifts in topic structures.
Provides interpretable insights into the evolution of topics over time.
Abstract
Changepoint analysis deals with unsupervised detection and/or estimation of time-points in time-series data, when the distribution generating the data changes. In this article, we consider \emph{offline} changepoint detection in the context of large scale textual data. We build a specialised temporal topic model with provisions for changepoints in the distribution of topic proportions. As full likelihood based inference in this model is computationally intractable, we develop a computationally tractable approximate inference procedure. More specifically, we use sample splitting to estimate topic polytopes first and then apply a likelihood ratio statistic together with a modified version of the wild binary segmentation algorithm of Fryzlewicz et al. (2014). Our methodology facilitates automated detection of structural changes in large corpora without the need of manual processing by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Time Series Analysis and Forecasting
