Edge-Based Wedge Sampling to Estimate Triangle Counts in Very Large Graphs
Duru T\"urko\u{g}lu, Ata Turk

TL;DR
This paper introduces a hybrid edge-wedge sampling algorithm that efficiently estimates the number of triangles in large, sparse, power-law graphs with high accuracy and low sampling ratios.
Contribution
It proposes a novel hybrid sampling method combining edge and wedge sampling to improve triangle count estimation in large graphs.
Findings
Outperforms existing methods up to 8 times in sample size
Achieves 95% confidence in estimates with small samples
Provides accurate estimates for large sparse graphs
Abstract
The number of triangles in a graph is useful to deduce a plethora of important features of the network that the graph is modeling. However, finding the exact value of this number is computationally expensive. Hence, a number of approximation algorithms based on random sampling of edges, or wedges (adjacent edge pairs) have been proposed for estimating this value. We argue that for large sparse graphs with power-law degree distribution, random edge sampling requires sampling large number of edges before providing enough information for accurate estimation, and existing wedge sampling methods lead to biased samplings, which in turn lead to less accurate estimations. In this paper, we propose a hybrid algorithm between edge and wedge sampling that addresses the deficiencies of both approaches. We start with uniform edge sampling and then extend each selected edge to form a wedge that is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
