An Enhanced Model-based Approach for Short Text Clustering
Enhao Cheng, Shoujia Zhang, Jianhua Yin, Xuemeng Song, Tian Gan, Liqiang Nie

TL;DR
This paper introduces GSDMM+ and an improved clustering approach for short texts, effectively handling data sparsity and high dimensionality, and achieving more accurate, fine-grained clustering results with reduced noise.
Contribution
It proposes an enhanced GSDMM+ model with adaptive weighting and cluster merging, improving short text clustering accuracy and efficiency over existing methods.
Findings
GSDMM+ outperforms classical and state-of-the-art methods in clustering quality.
The approach effectively handles sparsity and high dimensionality of short texts.
Experimental results confirm the efficiency and effectiveness of the proposed methods.
Abstract
Short text clustering has become increasingly important with the popularity of social media like Twitter, Google+, and Facebook. Existing methods can be broadly categorized into two paradigms: topic model-based approaches and deep representation learning-based approaches. This task is inherently challenging due to the sparse, large-scale, and high-dimensional characteristics of the short text data. Furthermore, the computational intensity required by representation learning significantly increases the running time. To address these issues, we propose a collapsed Gibbs Sampling algorithm for the Dirichlet Multinomial Mixture model (GSDMM), which effectively handles the sparsity and high dimensionality of short texts while identifying representative words for each cluster. Based on several aspects of GSDMM that warrant further refinement, we propose an improved approach, GSDMM+, designed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Advanced Clustering Algorithms Research · Text and Document Classification Technologies
