Efficient High-Quality Clustering for Large Bipartite Graphs
Renchi Yang, Jieming Shi

TL;DR
This paper introduces two scalable and high-quality clustering methods, HOPE and HOPE+, for large bipartite graphs, significantly improving clustering accuracy and efficiency over existing approaches, especially on billion-edge datasets.
Contribution
The paper proposes novel formulations and optimization frameworks for k-Bipartite Graph Clustering, achieving state-of-the-art performance on large-scale graphs.
Findings
HOPE and HOPE+ outperform 13 competitors in clustering quality.
HOPE+ can process a 1.1 billion edge graph in 31 minutes.
The methods are highly scalable and effective for real-world large bipartite graphs.
Abstract
A bipartite graph contains inter-set edges between two disjoint vertex sets, and is widely used to model real-world data, such as user-item purchase records, author-article publications, and biological interactions between drugs and proteins. k-Bipartite Graph Clustering (k-BGC) is to partition the target vertex set in a bipartite graph into k disjoint clusters. The clustering quality is important to the utility of k-BGC in various applications like social network analysis, recommendation systems, text mining, and bioinformatics, to name a few. Existing approaches to k-BGC either output clustering results with compromised quality due to inadequate exploitation of high-order information between vertices, or fail to handle sizable bipartite graphs with billions of edges. Motivated by this, this paper presents two efficient k-BGC solutions, HOPE and HOPE+, which achieve state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research
MethodsSparse Evolutionary Training · High-Order Proximity preserved Embedding
