ClusTop: An unsupervised and integrated text clustering and topic extraction framework
Zhongtao Chen, Chenghu Mi, Siwei Duo, Jingfei He, Yatong Zhou

TL;DR
ClusTop is an unsupervised framework that simultaneously performs text clustering and topic extraction by integrating a specialized language model, improving the quality of both tasks through mutual reinforcement.
Contribution
This paper introduces a novel unsupervised framework that unifies text clustering and topic extraction, leveraging an enhanced language model to improve both tasks simultaneously.
Findings
Effective high-quality clustering results achieved
Simultaneous topic extraction from clusters demonstrated
Provides benchmarks for model combinations
Abstract
Text clustering and topic extraction are two important tasks in text mining. Usually, these two tasks are performed separately. For topic extraction to facilitate clustering, we can first project texts into a topic space and then perform a clustering algorithm to obtain clusters. To promote topic extraction by clustering, we can first obtain clusters with a clustering algorithm and then extract cluster-specific topics. However, this naive strategy ignores the fact that text clustering and topic extraction are strongly correlated and follow a chicken-and-egg relationship. Performing them separately fails to make them mutually benefit each other to achieve the best overall performance. In this paper, we propose an unsupervised text clustering and topic extraction framework (ClusTop) which integrates text clustering and topic extraction into a unified framework and can achieve high-quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Service-Oriented Architecture and Web Services · Web Data Mining and Analysis
