Short Text Topic Modeling Techniques, Applications, and Performance: A Survey
Qiang Jipeng, Qian Zhenyu, Li Yun, Yuan Yunhao, Wu Xindong

TL;DR
This survey reviews recent techniques for short text topic modeling, highlighting their categories, performance, and introduces an open-source library to facilitate further research and development in this area.
Contribution
It provides a comprehensive classification of short text topic modeling methods, introduces the first open-source library, and benchmarks various algorithms on real-world datasets.
Findings
Different methods show varying effectiveness depending on the dataset.
The open-source library STTM integrates multiple algorithms for easy comparison.
Short text models outperform traditional long text models on short text datasets.
Abstract
Analyzing short texts infers discriminative and coherent latent topics that is a critical and fundamental task since many real-world applications require semantic understanding of short texts. Traditional long text topic modeling algorithms (e.g., PLSA and LDA) based on word co-occurrences cannot solve this problem very well since only very limited word co-occurrence information is available in short texts. Therefore, short text topic modeling has already attracted much attention from the machine learning research community in recent years, which aims at overcoming the problem of sparseness in short texts. In this survey, we conduct a comprehensive review of various short text topic modeling techniques proposed in the literature. We present three categories of methods based on Dirichlet multinomial mixture, global word co-occurrences, and self-aggregation, with example of representative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Sentiment Analysis and Opinion Mining
