Towards the TopMost: A Topic Modeling System Toolkit
Xiaobao Wu, Fengjun Pan, Anh Tuan Luu

TL;DR
This paper introduces TopMost, a comprehensive toolkit for topic modeling that supports various models, datasets, and evaluation methods, facilitating easier use, comparison, and extension of topic modeling techniques.
Contribution
The paper presents TopMost, a modular and extensive toolkit that covers the full lifecycle of topic modeling, improving usability and comparability over existing solutions.
Findings
Supports a wide range of topic models and datasets
Enables rapid and fair comparisons of models
Facilitates flexible extensions and applications
Abstract
Topic models have a rich history with various applications and have recently been reinvigorated by neural topic modeling. However, these numerous topic models adopt totally distinct datasets, implementations, and evaluations. This impedes quick utilization and fair comparisons, and thereby hinders their research progress and applications. To tackle this challenge, we in this paper propose a Topic Modeling System Toolkit (TopMost). Compared to existing toolkits, TopMost stands out by supporting more extensive features. It covers a broader spectrum of topic modeling scenarios with their complete lifecycles, including datasets, preprocessing, models, training, and evaluations. Thanks to its highly cohesive and decoupled modular design, TopMost enables rapid utilization, fair comparisons, and flexible extensions of diverse cutting-edge topic models. Our code, tutorials, and documentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods
