Sampling properties of directed networks
Seung-Woo Son, Claire Christensen, Golnoosh Bizhani, David V. Foster,, Peter Grassberger, and Maya Paczuski

TL;DR
This study systematically examines how different sampling methods, especially BFS, distort the topological properties of directed networks, revealing biases that impact the interpretation of network structure and dynamics.
Contribution
It provides a comprehensive analysis of sampling biases on directed networks, comparing BFS and random sampling across multiple real-world datasets, and quantifies their effects on key structural properties.
Findings
Sampling method and coverage significantly alter network properties.
BFS sampling overestimates certain metrics at low coverage.
High coverage sampling yields more accurate structural estimates.
Abstract
For many real-world networks only a small "sampled" version of the original network may be investigated; those results are then used to draw conclusions about the actual system. Variants of breadth-first search (BFS) sampling, which are based on epidemic processes, are widely used. Although it is well established that BFS sampling fails, in most cases, to capture the IN-component(s) of directed networks, a description of the effects of BFS sampling on other topological properties are all but absent from the literature. To systematically study the effects of sampling biases on directed networks, we compare BFS sampling to random sampling on complete large-scale directed networks. We present new results and a thorough analysis of the topological properties of seven different complete directed networks (prior to sampling), including three versions of Wikipedia, three different sources of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
