Distributed Triangle Counting in the Graphulo Matrix Math Library
Dylan Hutchison

TL;DR
This paper adapts two triangle counting algorithms for large graphs to the Graphulo library within Apache Accumulo, enabling distributed, server-side processing and analyzing their performance on power law graphs.
Contribution
It introduces adaptations of adjacency and incidence matrix-based triangle counting algorithms to Graphulo for distributed graph analysis inside Accumulo.
Findings
Similar performance profiles for both algorithms on power law graphs
Data skew increasingly affects performance
Motivates development of skew-aware hybrid algorithms
Abstract
Triangle counting is a key algorithm for large graph analysis. The Graphulo library provides a framework for implementing graph algorithms on the Apache Accumulo distributed database. In this work we adapt two algorithms for counting triangles, one that uses the adjacency matrix and another that also uses the incidence matrix, to the Graphulo library for server-side processing inside Accumulo. Cloud-based experiments show a similar performance profile for these different approaches on the family of power law Graph500 graphs, for which data skew increasingly bottlenecks. These results motivate the design of skew-aware hybrid algorithms that we propose for future work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
