Efficient Estimation of Shortest-Path Distance Distributions to Samples in Graphs
Alan Zhu, Jiaqi Ma, Qiaozhu Mei

TL;DR
This paper introduces an efficient framework for estimating shortest-path distance distributions to samples in large graphs, aiding in assessing sampling bias and fairness without exhaustive computations.
Contribution
The authors develop a novel, fast estimation method for shortest-path distributions that works across various sampling techniques and graph structures, including community-structured graphs.
Findings
Framework is faster than empirical methods
Accurate on downstream comparison tasks
Handles graphs with community structures
Abstract
As large graph datasets become increasingly common across many fields, sampling is often needed to reduce the graphs into manageable sizes. This procedure raises critical questions about representativeness as no sample can capture the properties of the original graph perfectly, and different parts of the graph are not evenly affected by the loss. Recent work has shown that the distances from the non-sampled nodes to the sampled nodes can be a quantitative indicator of bias and fairness in graph machine learning. However, to our knowledge, there is no method for evaluating how a sampling method affects the distribution of shortest-path distances without actually performing the sampling and shortest-path calculation. In this paper, we present an accurate and efficient framework for estimating the distribution of shortest-path distances to the sample, applicable to a wide range of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Bayesian Modeling and Causal Inference
