Efficient High-Quality Clustering for Large Bipartite Graphs

Renchi Yang; Jieming Shi

arXiv:2312.16926·cs.SI·December 29, 2023·1 cites

Efficient High-Quality Clustering for Large Bipartite Graphs

Renchi Yang, Jieming Shi

PDF

Open Access 1 Repo

TL;DR

This paper introduces two scalable and high-quality clustering methods, HOPE and HOPE+, for large bipartite graphs, significantly improving clustering accuracy and efficiency over existing approaches, especially on billion-edge datasets.

Contribution

The paper proposes novel formulations and optimization frameworks for k-Bipartite Graph Clustering, achieving state-of-the-art performance on large-scale graphs.

Findings

01

HOPE and HOPE+ outperform 13 competitors in clustering quality.

02

HOPE+ can process a 1.1 billion edge graph in 31 minutes.

03

The methods are highly scalable and effective for real-world large bipartite graphs.

Abstract

A bipartite graph contains inter-set edges between two disjoint vertex sets, and is widely used to model real-world data, such as user-item purchase records, author-article publications, and biological interactions between drugs and proteins. k-Bipartite Graph Clustering (k-BGC) is to partition the target vertex set in a bipartite graph into k disjoint clusters. The clustering quality is important to the utility of k-BGC in various applications like social network analysis, recommendation systems, text mining, and bioinformatics, to name a few. Existing approaches to k-BGC either output clustering results with compromised quality due to inadequate exploitation of high-order information between vertices, or fail to handle sizable bipartite graphs with billions of edges. Motivated by this, this paper presents two efficient k-BGC solutions, HOPE and HOPE+, which achieve state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hkbu-lagas/hope
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research

MethodsSparse Evolutionary Training · High-Order Proximity preserved Embedding