Sequential Stratified Regeneration: MCMC for Large State Spaces with an Application to Subgraph Count Estimation
Carlos H. C. Teixeira, Mayank Kakodkar, Vin\'icius Dias, Wagner Meira, Jr., Bruno Ribeiro

TL;DR
This paper introduces Ripple, a scalable MCMC-based estimator using sequential stratified regenerations, capable of estimating large subgraph counts in massive graphs efficiently and accurately.
Contribution
The paper presents Ripple, a novel MCMC estimator with sequential stratified regenerations that significantly improves scalability for large graph substructure estimation.
Findings
Ripple accurately estimates subgraph counts up to 12 nodes.
Ripple scales to state spaces as large as 10^43 within hours.
The method is highly parallelizable and consistent.
Abstract
This work considers the general task of estimating the sum of a bounded function over the edges of a graph, given neighborhood query access and where access to the entire network is prohibitively expensive. To estimate this sum, prior work proposes Markov chain Monte Carlo (MCMC) methods that use random walks started at some seed vertex and whose equilibrium distribution is the uniform distribution over all edges, eliminating the need to iterate over all edges. Unfortunately, these existing estimators are not scalable to massive real-world graphs. In this paper, we introduce Ripple, an MCMC-based estimator that achieves unprecedented scalability by stratifying the Markov chain state space into ordered strata with a new technique that we denote {\em sequential stratified regenerations}. We show that the Ripple estimator is consistent, highly parallelizable, and scales well. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarkov Chains and Monte Carlo Methods · Bayesian Modeling and Causal Inference · Statistical Methods and Inference
