Model-based clustering of large networks
Duy Q. Vu, David R. Hunter, Michael Schweinberger

TL;DR
This paper introduces a scalable, flexible model-based clustering framework for large discrete networks, utilizing advanced variational algorithms and bootstrap methods, demonstrated on networks with over 131,000 nodes and billions of edges.
Contribution
It presents a novel, more adaptable modeling approach with improved algorithms and standard error estimation, enabling analysis of much larger networks than previously possible.
Findings
Successfully clustered a network with 131,000+ nodes and 17 billion edges
Developed a scalable variational EM algorithm with MM enhancements
Provided bootstrap-based standard error estimates for network parameters
Abstract
We describe a network clustering framework, based on finite mixture models, that can be applied to discrete-valued networks with hundreds of thousands of nodes and billions of edge variables. Relative to other recent model-based clustering work for networks, we introduce a more flexible modeling framework, improve the variational-approximation estimation algorithm, discuss and implement standard error estimation via a parametric bootstrap approach, and apply these methods to much larger data sets than those seen elsewhere in the literature. The more flexible framework is achieved through introducing novel parameterizations of the model, giving varying degrees of parsimony, using exponential family models whose structure may be exploited in various theoretical and algorithmic ways. The algorithms are based on variational generalized EM algorithms, where the E-steps are augmented by a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
