A Hybrid Sampling Scheme for Triangle Counting
John Kallaugher, Eric Price

TL;DR
This paper introduces a new hybrid sampling algorithm for estimating triangle counts in graph streams, achieving optimal bounds and extending to arbitrary small subgraph counting, with proven lower bounds matching its performance.
Contribution
It presents a novel parameterized sampling scheme that improves triangle counting efficiency and provides tight lower bounds, advancing streaming graph analysis methods.
Findings
Matches best known bounds for simple graphs like G(n, p)
Requires √T space when triangles share a vertex
Needs T^{1/3} samples for independent triangles
Abstract
We study the problem of estimating the number of triangles in a graph stream. No streaming algorithm can get sublinear space on all graphs, so methods in this area bound the space in terms of parameters of the input graph such as the maximum number of triangles sharing a single edge. We give a sampling algorithm that is additionally parameterized by the maximum number of triangles sharing a single vertex. Our bound matches the best known turnstile results in all graphs, and gets better performance on simple graphs like or a set of independent triangles. We complement the upper bound with a lower bound showing that no sampling algorithm can do better on those graphs by more than a log factor. In particular, any insertion stream algorithm must use space when all the triangles share a common vertex, and any sampling algorithm must take samples when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Privacy-Preserving Technologies in Data · Data Stream Mining Techniques
