Stability of Topic Modeling via Matrix Factorization

Mark Belford; Brian Mac Namee; Derek Greene

arXiv:1702.07186·cs.IR·September 12, 2017·2 cites

Stability of Topic Modeling via Matrix Factorization

Mark Belford, Brian Mac Namee, Derek Greene

PDF

Open Access 1 Repo

TL;DR

This paper investigates the instability in matrix factorization-based topic models caused by stochastic initialization and proposes an ensemble learning approach to improve stability and accuracy.

Contribution

It introduces new measures for assessing topic model stability and demonstrates that a K-Fold ensemble strategy effectively reduces instability and enhances model accuracy.

Findings

01

Ensemble methods significantly improve topic model stability.

02

Structured initialization reduces variability in results.

03

Ensemble strategies lead to more accurate topic extraction.

Abstract

Topic models can provide us with an insight into the underlying latent structure of a large corpus of documents. A range of methods have been proposed in the literature, including probabilistic topic models and techniques based on matrix factorization. However, in both cases, standard implementations rely on stochastic elements in their initialization phase, which can potentially lead to different results being generated on the same corpus when using the same parameter values. This corresponds to the concept of "instability" which has previously been studied in the context of $k$ -means clustering. In many applications of topic modeling, this problem of instability is not considered and topic models are treated as being definitive, even though the results may change considerably if the initialization process is altered. In this paper we demonstrate the inherent instability of popular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

derekgreene/topic-ensemble
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Advanced Text Analysis Techniques