Fast Clustering and Topic Modeling Based on Rank-2 Nonnegative Matrix Factorization
Da Kuang, Barry Drake, Haesun Park

TL;DR
This paper introduces HierNMF2 and FlatNMF2, fast hierarchical and flat clustering and topic modeling methods based on Rank-2 NMF, demonstrating significant improvements in speed and quality over existing techniques.
Contribution
The paper proposes novel hierarchical and flat clustering methods using Rank-2 NMF, with optimized C++ implementations and extensive experiments showing superior performance.
Findings
Significant reduction in computational time.
Improved clustering and topic modeling quality.
Outperforms K-means, LDA, and standard NMF.
Abstract
The importance of unsupervised clustering and topic modeling is well recognized with ever-increasing volumes of text data. In this paper, we propose a fast method for hierarchical clustering and topic modeling called HierNMF2. Our method is based on fast Rank-2 nonnegative matrix factorization (NMF) that performs binary clustering and an efficient node splitting rule. Further utilizing the final leaf nodes generated in HierNMF2 and the idea of nonnegative least squares fitting, we propose a new clustering/topic modeling method called FlatNMF2 that recovers a flat clustering/topic modeling result in a very simple yet significantly more effective way than any other existing methods. We implement highly optimized open source software in C++ for both HierNMF2 and FlatNMF2 for hierarchical and partitional clustering/topic modeling of document data sets. Substantial experimental tests are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Text and Document Classification Technologies · Face and Expression Recognition
