Weighted Flow Diffusion for Local Graph Clustering with Node Attributes: an Algorithm and Statistical Guarantees
Shenghao Yang, Kimon Fountoulakis

TL;DR
This paper introduces a local graph clustering algorithm that leverages node attributes and structural information, providing theoretical guarantees and demonstrating improved performance on synthetic and real-world datasets.
Contribution
The paper proposes a novel diffusing mass algorithm for graphs with node attributes, with statistical guarantees for cluster recovery under general random graph models.
Findings
Algorithm achieves accurate cluster recovery with a single seed node.
Incorporating node attributes improves clustering performance.
Theoretical guarantees hold for models including stochastic block and planted cluster models.
Abstract
Local graph clustering methods aim to detect small clusters in very large graphs without the need to process the whole graph. They are fundamental and scalable tools for a wide range of tasks such as local community detection, node ranking and node embedding. While prior work on local graph clustering mainly focuses on graphs without node attributes, modern real-world graph datasets typically come with node attributes that provide valuable additional information. We present a simple local graph clustering algorithm for graphs with node attributes, based on the idea of diffusing mass locally in the graph while accounting for both structural and attribute proximities. Using high-dimensional concentration results, we provide statistical guarantees on the performance of the algorithm for the recovery of a target cluster with a single seed node. We give conditions under which a target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Clustering Algorithms Research · Human Mobility and Location-Based Analysis
