Tiered Sampling: An Efficient Method for Approximate Counting Sparse   Motifs in Massive Graph Streams

Lorenzo De Stefani; Erisa Terolli; Eli Upfal

arXiv:1710.02108·cs.DS·October 6, 2017

Tiered Sampling: An Efficient Method for Approximate Counting Sparse Motifs in Massive Graph Streams

Lorenzo De Stefani, Erisa Terolli, Eli Upfal

PDF

TL;DR

Tiered Sampling is a new single-pass streaming algorithm that efficiently estimates the count of sparse motifs like 4 and 5-cliques in massive graphs using limited memory, outperforming traditional methods.

Contribution

We propose a novel tiered reservoir sampling technique that improves approximate counting of sparse motifs in graph streams with fixed memory.

Findings

01

Accurately estimates 4 and 5-cliques in large graphs.

02

Uses significantly less memory than existing methods.

03

Demonstrates superior performance on synthetic and real data.

Abstract

We introduce Tiered Sampling, a novel technique for approximate counting sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size $M$ , which can be magnitudes smaller than the number of edges. Our methods addresses the challenging task of counting sparse motifs - sub-graph patterns that have low probability to appear in a sample of $M$ edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count we partition the available memory to tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.