All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis
Edith Cohen

TL;DR
This paper revisits all-distances sketches (ADS) for massive graphs, introducing HIP estimators that significantly improve variance and estimation quality for graph statistics and data stream distinct counting.
Contribution
It provides a unified exposition of ADS algorithms and introduces HIP estimators that outperform previous methods in variance and accuracy.
Findings
HIP estimators halve the variance for neighborhood and centrality estimates
HIP achieves polynomial variance reduction for general statistics
HIP improves HyperLogLog MinHash estimations in data streams
Abstract
Graph datasets with billions of edges, such as social and Web graphs, are prevalent, and scalable computation is critical. All-distances sketches (ADS) [Cohen 1997], are a powerful tool for scalable approximation of statistics. The sketch is a small size sample of the distance relation of a node which emphasizes closer nodes. Sketches for all nodes are computed using a nearly linear computation and estimators are applied to sketches of nodes to estimate their properties. We provide, for the first time, a unified exposition of ADS algorithms and applications. We present the Historic Inverse Probability (HIP) estimators which are applied to the ADS of a node to estimate a large natural class of statistics. For the important special cases of neighborhood cardinalities (the number of nodes within some query distance) and closeness centralities, HIP estimators have at most half the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Data Management and Algorithms · Data Visualization and Analytics
