A space efficient streaming algorithm for triangle counting using the birthday paradox
Madhav Jha, C. Seshadhri, Ali Pinar

TL;DR
This paper introduces a space-efficient streaming algorithm for approximating triangle counts and transitivity in large graphs, leveraging the birthday paradox to operate with minimal memory in a single pass.
Contribution
The authors present a novel single-pass streaming algorithm that uses probabilistic principles to accurately estimate triangle counts with significantly reduced memory.
Findings
Requires only O(√n) space for constant transitivity and more edges than wedges
Stores just 60,000 edges for a 200 million edge graph to achieve accurate estimates
Operates in real-time, providing continuous estimates of graph transitivity and triangle count
Abstract
We design a space efficient algorithm that approximates the transitivity (global clustering coefficient) and total triangle count with only a single pass through a graph given as a stream of edges. Our procedure is based on the classic probabilistic result, the birthday paradox. When the transitivity is constant and there are more edges than wedges (common properties for social networks), we can prove that our algorithm requires space ( is the number of vertices) to provide accurate estimates. We run a detailed set of experiments on a variety of real graphs and demonstrate that the memory requirement of the algorithm is a tiny fraction of the graph. For example, even for a graph with 200 million edges, our algorithm stores just 60,000 edges to give accurate results. Being a single pass streaming algorithm, our procedure also maintains a real-time estimate of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Complex Network Analysis Techniques · Peer-to-Peer Network Technologies
