GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph Streams
David Tench, Evan West, Victor Zhang, Michael A. Bender, Abiyaz, Chowdhury, J. Ahmed Dellas, Martin Farach-Colton, Tyler Seip, Kenny Zhang

TL;DR
GraphZeppelin introduces a high-performance streaming system utilizing new linear sketching data structures to efficiently compute connected components on large, dynamic graphs that exceed available RAM, enabling processing of millions of updates per second.
Contribution
The paper presents GraphZeppelin, a novel system that employs CubeSketches for space-efficient, fast connected components computation on massive dynamic graphs beyond RAM capacity.
Findings
Processes millions of edge updates per second
Uses space asymptotically smaller than lossless graph representations
Enables analysis of graphs larger than available RAM
Abstract
Finding the connected components of a graph is a fundamental problem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge insertions and deletions. A natural approach to computing the connected components on a large, dynamic graph stream is to buy enough RAM to store the entire graph. However, the requirement that the graph fit in RAM is prohibitive for very large graphs. Thus, there is an unmet need for systems that can process dense dynamic graphs, especially when those graphs are larger than available RAM. We present a new high-performance streaming graph-processing system for computing the connected components of a graph. This system, which we call GraphZeppelin, uses new linear sketching data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Cloud Computing and Resource Management · Graph Theory and Algorithms
