Iterative Improvement of an Additively Regularized Topic Model

Alex Gorbulev; Vasiliy Alekseev; Konstantin Vorontsov

arXiv:2408.05840·cs.CL·September 27, 2024

Iterative Improvement of an Additively Regularized Topic Model

Alex Gorbulev, Vasiliy Alekseev, Konstantin Vorontsov

PDF

Open Access 1 Repo

TL;DR

The paper introduces ITAR, an iterative method for training additively regularized topic models that improves stability, diversity, and performance over existing models like LDA, ARTM, and BERTopic.

Contribution

It presents a novel iterative training approach that enhances topic model quality by retaining good topics across iterations using additive regularization.

Findings

01

ITAR outperforms LDA, ARTM, and BERTopic in experiments

02

Topics generated by ITAR are more diverse

03

ITAR achieves moderate perplexity indicating good data explanation

Abstract

Topic modelling is fundamentally a soft clustering problem (of known objects -- documents, over unknown clusters -- topics). That is, the task is incorrectly posed. In particular, the topic models are unstable and incomplete. All this leads to the fact that the process of finding a good topic model (repeated hyperparameter selection, model training, and topic quality assessment) can be particularly long and labor-intensive. We aim to simplify the process, to make it more deterministic and provable. To this end, we present a method for iterative training of a topic model. The essence of the method is that a series of related topic models are trained so that each subsequent model is at least as good as the previous one, i.e., that it retains all the good topics found earlier. The connection between the models is achieved by additive regularization. The result of this iterative training is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

machine-intelligence-laboratory/OptimalNumberOfTopics
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExpert finding and Q&A systems