Hybrid Multisource Feature Fusion for the Text Clustering
Jiaxuan Chen, Shenglin Gui

TL;DR
This paper introduces a hybrid multisource feature fusion framework for text clustering that combines multiple feature sources and similarity matrices to improve clustering performance, outperforming recent algorithms on various datasets.
Contribution
The paper proposes a novel HMFF framework that fuses features from multiple sources using mutual similarity matrices and dimensionality reduction, enhancing clustering accuracy.
Findings
Outperforms recent algorithms on 7 of 11 benchmark datasets
Achieves leading performance on remaining datasets
Effectively clusters COVID-19 data with unknown cluster count
Abstract
The text clustering technique is an unsupervised text mining method which are used to partition a huge amount of text documents into groups. It has been reported that text clustering algorithms are hard to achieve better performance than supervised methods and their clustering performance is highly dependent on the picked text features. Currently, there are many different types of text feature generation algorithms, each of which extracts text features from some specific aspects, such as VSM and distributed word embedding, thus seeking a new way of obtaining features as complete as possible from the corpus is the key to enhance the clustering effects. In this paper, we present a hybrid multisource feature fusion (HMFF) framework comprising three components, feature representation of multimodel, mutual similarity matrices and feature fusion, in which we construct mutual similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Topic Modeling · Natural Language Processing Techniques
Methodsk-Means Clustering
