Stability of Topic Modeling via Matrix Factorization
Mark Belford, Brian Mac Namee, Derek Greene

TL;DR
This paper investigates the instability in matrix factorization-based topic models caused by stochastic initialization and proposes an ensemble learning approach to improve stability and accuracy.
Contribution
It introduces new measures for assessing topic model stability and demonstrates that a K-Fold ensemble strategy effectively reduces instability and enhances model accuracy.
Findings
Ensemble methods significantly improve topic model stability.
Structured initialization reduces variability in results.
Ensemble strategies lead to more accurate topic extraction.
Abstract
Topic models can provide us with an insight into the underlying latent structure of a large corpus of documents. A range of methods have been proposed in the literature, including probabilistic topic models and techniques based on matrix factorization. However, in both cases, standard implementations rely on stochastic elements in their initialization phase, which can potentially lead to different results being generated on the same corpus when using the same parameter values. This corresponds to the concept of "instability" which has previously been studied in the context of -means clustering. In many applications of topic modeling, this problem of instability is not considered and topic models are treated as being definitive, even though the results may change considerably if the initialization process is altered. In this paper we demonstrate the inherent instability of popular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Advanced Text Analysis Techniques
