Accurate and Fast Approximate Graph Pattern Mining at Scale
Anna Arpaci-Dusseau, Zixiang Zhou, Xuhao Chen

TL;DR
This paper introduces ScaleGPM, a novel approximate graph pattern mining system that achieves fast, stable, and theoretically guaranteed convergence, effectively handling billion-scale graphs with significant speed improvements over existing methods.
Contribution
ScaleGPM presents a new on-the-fly convergence detection and two techniques, eager-verify and hybrid sampling, to improve efficiency and reliability in approximate graph pattern mining.
Findings
Achieves up to 610,169x speedup over Arya.
Handles billion-scale graphs in seconds.
Provides theoretical confidence guarantees for convergence.
Abstract
Approximate graph pattern mining (A-GPM) is an important data analysis tool for many graph-based applications. There exist sampling-based A-GPM systems to provide automation and generalization over a wide variety of use cases. However, there are two major obstacles that prevent existing A-GPM systems being adopted in practice. First, the termination mechanism that decides when to end sampling lacks theoretical backup on confidence, and is unstable and slow in practice. Second, they suffer poor performance when dealing with the "needle-in-the-hay" cases, because a huge number of samples are required to converge, given the extremely low hit rate of their fixed sampling schemes. We build ScaleGPM, an accurate and fast A-GPM system that removes the two obstacles. First, we propose a novel on-the-fly convergence detection mechanism to achieve stable termination and provide theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Graph Theory and Algorithms
