Hybrid Clustering based on Content and Connection Structure using Joint Nonnegative Matrix Factorization
Rundong Du, Barry Drake, Haesun Park

TL;DR
This paper introduces a hybrid clustering method that combines content and connection structure using joint nonnegative matrix factorization, improving clustering quality on datasets with both text and graph data.
Contribution
The paper proposes a novel joint NMF and SymNMF optimization framework with an efficient algorithm for hybrid clustering, extending to hypergraph and multi-feature data.
Findings
Hybrid method outperforms standard NMF and SymNMF in clustering quality.
Effective for datasets with both content and connection information.
Applicable to real-world data like citation networks and adaptable to multiple features.
Abstract
We present a hybrid method for latent information discovery on the data sets containing both text content and connection structure based on constrained low rank approximation. The new method jointly optimizes the Nonnegative Matrix Factorization (NMF) objective function for text clustering and the Symmetric NMF (SymNMF) objective function for graph clustering. We propose an effective algorithm for the joint NMF objective function, based on a block coordinate descent (BCD) framework. The proposed hybrid method discovers content associations via latent connections found using SymNMF. The method can also be applied with a natural conversion of the problem when a hypergraph formulation is used or the content is associated with hypergraph edges. Experimental results show that by simultaneously utilizing both content and connection structure, our hybrid method produces higher quality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques
