Accuracy and Scaling Phenomena in Internet Mapping
Aaron Clauset, Cristopher Moore

TL;DR
This paper investigates how traceroute sampling biases affect the observed degree distribution of the Internet, revealing that small sample sizes can distort true network properties and suggesting methods for more accurate estimation.
Contribution
It provides an analytical and experimental analysis of sampling bias in Internet topology measurements, highlighting the need for multiple sources to accurately estimate degree distribution parameters.
Findings
Traceroute sampling introduces a bias, showing a 1/k degree distribution in ER graphs.
Small sample sizes underestimate the power-law exponent alpha in scale-free networks.
Using a number of sources proportional to the average degree improves estimation accuracy.
Abstract
A great deal of effort has been spent measuring topological features of the Internet. However, it was recently argued that sampling based on taking paths or traceroutes through the network from a small number of sources introduces a fundamental bias in the observed degree distribution. We examine this bias analytically and experimentally. For Erdos-Renyi random graphs with mean degree c, we show analytically that traceroute sampling gives an observed degree distribution P(k) ~ 1/k for k < c, even though the underlying degree distribution is Poisson. For graphs whose degree distributions have power-law tails P(k) ~ k^-alpha, traceroute sampling from a small number of sources can significantly underestimate the value of \alpha when the graph has a large excess (i.e., many more edges than vertices). We find that in order to obtain a good estimate of alpha it is necessary to use a number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
