From NoSQL Accumulo to NewSQL Graphulo: Design and Utility of Graph Algorithms inside a BigTable Database
Dylan Hutchison, Jeremy Kepner, Vijay Gadepally, Bill Howe

TL;DR
This paper introduces Graphulo, a library that enables execution of graph algorithms within a BigTable database, demonstrating its performance and analyzing when in-database computation is advantageous.
Contribution
It presents the design and implementation of Graphulo for running GraphBLAS kernels inside Apache Accumulo, bridging NoSQL and NewSQL paradigms for graph analytics.
Findings
Memory requirements influence performance outcomes.
I/O costs are critical in determining execution speed.
In-database graph algorithms can outperform external systems under certain conditions.
Abstract
Google BigTable's scale-out design for distributed key-value storage inspired a generation of NoSQL databases. Recently the NewSQL paradigm emerged in response to analytic workloads that demand distributed computation local to data storage. Many such analytics take the form of graph algorithms, a trend that motivated the GraphBLAS initiative to standardize a set of matrix math kernels for building graph algorithms. In this article we show how it is possible to implement the GraphBLAS kernels in a BigTable database by presenting the design of Graphulo, a library for executing graph algorithms inside the Apache Accumulo database. We detail the Graphulo implementation of two graph algorithms and conduct experiments comparing their performance to two main-memory matrix math systems. Our results shed insight into the conditions that determine when executing a graph algorithm is faster inside…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
