AutoTM 2.0: Automatic Topic Modeling Framework for Documents Analysis

Maria Khodorchenko; Nikolay Butakov; Maxim Zuev; Denis; Nasonov

arXiv:2410.00655·cs.LG·October 2, 2024

AutoTM 2.0: Automatic Topic Modeling Framework for Documents Analysis

Maria Khodorchenko, Nikolay Butakov, Maxim Zuev, Denis, Nasonov

PDF

Open Access

TL;DR

AutoTM 2.0 is an advanced framework for automatic topic modeling that improves optimization, quality assessment, and usability, enabling better analysis of multilingual text datasets.

Contribution

The paper introduces AutoTM 2.0 with novel optimization pipeline, LLM-based quality metrics, and distributed mode, enhancing automatic topic modeling for diverse datasets.

Findings

01

AutoTM 2.0 outperforms its previous version on multiple datasets.

02

Incorporates LLM-based quality metrics for better evaluation.

03

Supports distributed processing for scalability.

Abstract

In this work, we present an AutoTM 2.0 framework for optimizing additively regularized topic models. Comparing to the previous version, this version includes such valuable improvements as novel optimization pipeline, LLM-based quality metrics and distributed mode. AutoTM 2.0 is a comfort tool for specialists as well as non-specialists to work with text documents to conduct exploratory data analysis or to perform clustering task on interpretable set of features. Quality evaluation is based on specially developed metrics such as coherence and gpt-4-based approaches. Researchers and practitioners can easily integrate new optimization algorithms and adapt novel metrics to enhance modeling quality and extend their experiments. We show that AutoTM 2.0 achieves better performance compared to the previous AutoTM by providing results on 5 datasets with different features and in two different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Advanced Text Analysis Techniques

MethodsSparse Evolutionary Training