Counting Butterflies over Streaming Bipartite Graphs with Duplicate Edges
Lingkai Meng, Long Yuan, Xuemin Lin, Chengjie Li, Kai Wang, and Wenjie Zhang

TL;DR
This paper introduces DEABC, a novel streaming algorithm for accurately counting butterflies in bipartite graphs with duplicate edges, improving efficiency and accuracy over existing methods.
Contribution
DEABC is a new bucket-based sampling method that handles duplicate edges in streaming bipartite graphs, reducing memory usage and increasing accuracy.
Findings
DEABC provides unbiased butterfly count estimates with proven variance bounds.
DEABC outperforms existing algorithms in memory efficiency and accuracy on real-world data.
DEABC achieves higher throughput compared to prior methods.
Abstract
Bipartite graphs are commonly used to model relationships between two distinct entities in real-world applications, such as user-product interactions, user-movie ratings and collaborations between authors and publications. A butterfly (a 2x2 bi-clique) is a critical substructure in bipartite graphs, playing a significant role in tasks like community detection, fraud detection, and link prediction. As more real-world data is presented in a streaming format, efficiently counting butterflies in streaming bipartite graphs has become increasingly important. However, most existing algorithms typically assume that duplicate edges are absent, which is hard to hold in real-world graph streams, as a result, they tend to sample edges that appear multiple times, leading to inaccurate results. The only algorithm designed to handle duplicate edges is FABLE, but it suffers from significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
