Sampling Online Social Networks via Heterogeneous Statistics
Xin Wang, Richard T. B. Ma, Yinlong Xu, Zhipeng Li

TL;DR
This paper introduces a novel adaptive two-stage sampling framework for online social networks that optimally combines multiple heterogeneous sampling statistics to improve measurement accuracy and efficiency.
Contribution
It formulates a mixture sampling problem, derives optimal weights for combining estimators, and proposes an adaptive framework that outperforms traditional methods.
Findings
The adaptive framework achieves higher efficiency than benchmark strategies.
Optimal weights minimize asymptotic variance in mixture estimators.
Two-stage approach effectively identifies the most efficient statistics for sampling.
Abstract
Most sampling techniques for online social networks (OSNs) are based on a particular sampling method on a single graph, which is referred to as a statistics. However, various realizing methods on different graphs could possibly be used in the same OSN, and they may lead to different sampling efficiencies, i.e., asymptotic variances. To utilize multiple statistics for accurate measurements, we formulate a mixture sampling problem, through which we construct a mixture unbiased estimator which minimizes asymptotic variance. Given fixed sampling budgets for different statistics, we derive the optimal weights to combine the individual estimators; given fixed total budget, we show that a greedy allocation towards the most efficient statistics is optimal. In practice, the sampling efficiencies of statistics can be quite different for various targets and are unknown before sampling. To solve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Spam and Phishing Detection · HIV, Drug Use, Sexual Risk
