Scalable Community Detection in Massive Networks Using Aggregated Relational Data
Timothy Jones, Owen G. Ward, Yiran Jiang, John Paisley, Tian Zheng

TL;DR
This paper introduces a scalable, parallel inference method for the mixed membership stochastic blockmodel (MMSB) that efficiently handles massive networks by leveraging aggregated relational data and nodal information.
Contribution
The authors develop a novel mini-batch inference strategy for MMSB that enables efficient community detection in networks with millions of nodes using parallel stochastic variational inference.
Findings
Successfully applied to a citation network with over two million nodes.
Achieves better convergence and parameter recovery on simulated networks.
Captures meaningful community structure in large-scale real-world networks.
Abstract
The mixed membership stochastic blockmodel (MMSB) is a popular Bayesian network model for community detection. Fitting such large Bayesian network models quickly becomes computationally infeasible when the number of nodes grows into hundreds of thousands and millions. In this paper we propose a novel mini-batch strategy based on aggregated relational data that leverages nodal information to fit MMSB to massive networks. We describe a scalable inference method that can utilize nodal information that often accompanies real-world networks. Conditioning on this extra information leads to a model that admits a parallel stochastic variational inference algorithm, utilizing stochastic gradients of bipartite graph formed from aggregated network ties between node subpopulations. We apply our method to a citation network with over two million nodes and 25 million edges, capturing explainable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Complex Network Analysis Techniques · Advanced Clustering Algorithms Research
