Graph Sample and Hold: A Framework for Big-Graph Analytics
Nesreen K. Ahmed, Nick Duffield, Jennifer Neville, Ramana Kompella

TL;DR
The paper introduces Graph Sample and Hold (gSH), a unified streaming sampling framework for big-graph analytics that efficiently estimates multiple graph properties with unbiased estimators from a single sample.
Contribution
It proposes a generic, single-pass streaming sampling framework for big graphs that enables unbiased estimation of various properties while controlling runtime and error.
Findings
Effective on social and web graphs
Produces unbiased estimators for multiple properties
Maintains small state during sampling
Abstract
Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population. Unfortunately, such a perfect sample is hard to collect in complex populations such as graphs (e.g. web graphs, social networks etc), where an underlying network connects the units of the population. Therefore, a good sample will be representative in the sense that graph properties of interest can be estimated with a known degree of accuracy. While previous work focused particularly on sampling schemes used to estimate certain graph properties (e.g. triangle count), much less is known for the case when we need to estimate various graph properties with the same sampling scheme. In this paper, we propose a generic stream sampling framework for big-graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Graph Neural Networks · Software System Performance and Reliability
