Let the Pretrained Language Models "Imagine" for Short Texts Topic Modeling
Pritom Saha Akash, Jie Huang, Kevin Chen-Chuan Chang

TL;DR
This paper introduces a novel approach for short text topic modeling by leveraging pre-trained language models to extend texts into longer sequences, significantly improving topic coherence and quality in sparse data scenarios.
Contribution
It proposes a new method that combines PLMs with neural topic models to effectively address data sparsity in short texts, outperforming existing models.
Findings
Model achieves higher topic coherence scores.
Outperforms state-of-the-art short text topic models.
Effective in extreme data sparsity scenarios.
Abstract
Topic models are one of the compelling methods for discovering latent semantics in a document collection. However, it assumes that a document has sufficient co-occurrence information to be effective. However, in short texts, co-occurrence information is minimal, which results in feature sparsity in document representation. Therefore, existing topic models (probabilistic or neural) mostly fail to mine patterns from them to generate coherent topics. In this paper, we take a new approach to short-text topic modeling to address the data-sparsity issue by extending short text into longer sequences using existing pre-trained language models (PLMs). Besides, we provide a simple solution extending a neural topic model to reduce the effect of noisy out-of-topics text generation from PLMs. We observe that our model can substantially improve the performance of short-text topic modeling. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Recommender Systems and Techniques
