An Efficient Streaming Algorithm for Approximating Graphlet Distributions
Marco Bressan, T-H. Hubert Chan, Qipeng Kuang, Mauro Sozio

TL;DR
This paper introduces a streaming algorithm that approximates graphlet distributions more efficiently than previous methods, requiring fewer passes and comparable or better accuracy on large graphs.
Contribution
The authors develop a new streaming algorithm that reduces the number of passes needed for approximating graphlet frequencies, improving upon prior work with near-optimal memory usage.
Findings
Our algorithm makes only a constant number of passes, breaking previous logarithmic bounds.
It achieves comparable or better approximation accuracy on real-world and synthetic graphs.
Outperforms previous algorithms by orders of magnitude on mildly dense graphs.
Abstract
In recent years, the problem of computing the frequencies of the induced -vertex subgraphs of a graph, or \emph{-graphlets}, has become central. One approach for this problem is to sample -graphlets randomly. Classic algorithms for -graphlet sampling require loading the entire graph into main memory, making them impractical for massive graphs. To bypass this limitation, Bourreau et al. (NeurIPS 2024) introduced a \emph{streaming} algorithm that through nontrivial techniques makes only passes using memory. In this work we break their -pass bound by giving an algorithm that, for any fixed , makes passes using memory. As a consequence of their lower bound, our algorithm is optimal up to a factor of in the memory usage. We use this sampling algorithm to obtain an efficient method of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
