Estimating Shortest Path Length Distributions via Random Walk Sampling
Minhui Zheng, Bruce D. Spencer

TL;DR
This paper introduces a novel method to estimate shortest path length distributions in large networks using random walk sampling and advanced estimators, achieving high accuracy with minimal sampling effort.
Contribution
It develops a generalized estimator framework for dyad inclusion probabilities and proposes strategies for SPL approximation based on network degree variability.
Findings
High estimation accuracy with at least 20% node coverage in large networks
Estimation performance improves with network size and stabilizes
Single random walk performs as well as multiple walks
Abstract
In a network, the shortest paths between nodes are of great importance as they allow the fastest and strongest interaction between nodes. However measuring the shortest paths between all nodes in a large network is computationally expensive. In this paper we propose a method to estimate the shortest path length (SPL) distribution of a network by random walk sampling. To deal with the unequal inclusion probabilities of dyads (pairs of nodes) in the sample, we generalize the usage of Hansen-Hurwitz estimator and Horvitz-Thompson estimator (and their ratio forms) and apply them to the sampled dyads. Based on theory of Markov chains we prove that the selection probability of a dyad is proportional to the product of the degrees of the two nodes. To approximate the actual SPL for a dyad, we use the observed SPL in the induced subgraph for networks with large degree variability, i.e., the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Stochastic processes and statistical mechanics · Bayesian Methods and Mixture Models
