Effective and Scalable Clustering on Massive Attributed Graphs
Renchi Yang, Jieming Shi, Yin Yang, Keke Huang, Shiqi Zhang and, Xiaokui Xiao

TL;DR
This paper introduces ACMin, a scalable and high-quality clustering method for massive attributed graphs that outperforms existing solutions in both speed and accuracy, even on billion-scale datasets.
Contribution
ACMin presents a novel attributed multi-hop conductance measure and a linear-time optimization algorithm for efficient, high-quality clustering on massive attributed graphs.
Findings
ACMin outperforms 11 competitors in quality and speed.
ACMin handles billion-scale graphs within hours.
The new measure effectively captures topological and attribute coherence.
Abstract
Given a graph G where each node is associated with a set of attributes, and a parameter k specifying the number of output clusters, k-attributed graph clustering (k-AGC) groups nodes in G into k disjoint clusters, such that nodes within the same cluster share similar topological and attribute characteristics, while those in different clusters are dissimilar. This problem is challenging on massive graphs, e.g., with millions of nodes and billions of edges. For such graphs, existing solutions either incur prohibitively high costs, or produce clustering results with compromised quality. In this paper, we propose ACMin, an effective approach to k-AGC that yields high-quality clusters with cost linear to the size of the input graph G. The main contributions of ACMin are twofold: (i) a novel formulation of the k-AGC problem based on an attributed multi-hop conductance quality measure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
