Graph Size Estimation
Maciej Kurant, Carter T. Butts, Athina Markopoulou

TL;DR
This paper introduces a novel, efficient method for estimating the size of large, partially known graphs using random walk sampling, significantly reducing the number of samples needed compared to existing techniques.
Contribution
It presents IE, an efficient estimator based on induced edges, and SafetyMargin, a correction method for dependence in random walk samples, combined to improve graph size estimation.
Findings
IE with SafetyMargin requires at least 10 times fewer samples than previous methods.
The approach is effective on real-world networks like Facebook.
The combined method reduces sample complexity substantially.
Abstract
Many online networks are not fully known and are often studied via sampling. Random Walk (RW) based techniques are the current state-of-the-art for estimating nodal attributes and local graph properties, but estimating global properties remains a challenge. In this paper, we are interested in a fundamental property of this type - the graph size N, i.e., the number of its nodes. Existing methods for estimating N are (i) inefficient and (ii) cannot be easily used with RW sampling due to dependence between successive samples. In this paper, we address both problems. First, we propose IE (Induced Edges), an efficient technique for estimating N from an independence sample of graph's nodes. IE exploits the edges induced on the sampled nodes. Second, we introduce SafetyMargin, a method that corrects estimators for dependence in RW samples. Finally, we combine these two stand-alone techniques…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Caching and Content Delivery · Peer-to-Peer Network Technologies
