Measuring LDA Topic Stability from Clusters of Replicated Runs

Mika M\"antyl\"a; Ma\"elick Claes; Umar Farooq

arXiv:1808.08098·cs.CL·September 4, 2018

Measuring LDA Topic Stability from Clusters of Replicated Runs

Mika M\"antyl\"a, Ma\"elick Claes, Umar Farooq

PDF

1 Repo

TL;DR

This paper introduces a method to measure the stability of LDA topics by clustering replicated runs and applying a stability metric, helping to assess the reliability of topics in large text datasets.

Contribution

It proposes a novel approach combining replicated LDA runs, clustering, and stability metrics to evaluate topic stability, enhancing interpretability and reproducibility of LDA results.

Findings

01

Method applied to 270,000 Mozilla Firefox commit messages.

02

Rank-Biased Overlap effectively measures topic stability.

03

Provides transparent assessment of LDA topic reliability.

Abstract

Background: Unstructured and textual data is increasing rapidly and Latent Dirichlet Allocation (LDA) topic modeling is a popular data analysis methods for it. Past work suggests that instability of LDA topics may lead to systematic errors. Aim: We propose a method that relies on replicated LDA runs, clustering, and providing a stability metric for the topics. Method: We generate k LDA topics and replicate this process n times resulting in n*k topics. Then we use K-medioids to cluster the n*k topics to k clusters. The k clusters now represent the original LDA topics and we present them like normal LDA topics showing the ten most probable words. For the clusters, we try multiple stability metrics, out of which we recommend Rank-Biased Overlap, showing the stability of the topics inside the clusters. Results: We provide an initial validation where our method is used for 270,000 Mozilla…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

M3SOulu/Measuring-LDA-Topic-Stability
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Discriminant Analysis