GloCOM: A Short Text Neural Topic Model via Global Clustering Context
Quang Duc Nguyen, Tung Nguyen, Duc Anh Nguyen, Linh Ngo Van, Sang, Dinh, Thien Huu Nguyen

TL;DR
GloCOM introduces a novel neural topic model for short texts that leverages global clustering contexts derived from pre-trained embeddings to improve topic discovery and document representation quality.
Contribution
The paper presents GloCOM, a new model that constructs global clustering contexts to enhance short text topic modeling, addressing data and label sparsity issues.
Findings
Outperforms state-of-the-art models in topic quality
Improves document representation accuracy
Effectively handles data and label sparsity
Abstract
Uncovering hidden topics from short texts is challenging for traditional and neural models due to data sparsity, which limits word co-occurrence patterns, and label sparsity, stemming from incomplete reconstruction targets. Although data aggregation offers a potential solution, existing neural topic models often overlook it due to time complexity, poor aggregation quality, and difficulty in inferring topic proportions for individual documents. In this paper, we propose a novel model, GloCOM (Global Clustering COntexts for Topic Models), which addresses these challenges by constructing aggregated global clustering contexts for short documents, leveraging text embeddings from pre-trained language models. GloCOM can infer both global topic distributions for clustering contexts and local distributions for individual short texts. Additionally, the model incorporates these global contexts to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques
