Exploration of Fine-Grained Parallelism for Load Balancing Eager K-truss on GPU and CPU
Mark Blanco, Tze Meng Low, Kyungjoo Kim

TL;DR
This paper introduces a fine-grained parallel approach for the Eager K-truss algorithm, significantly improving load balancing and performance on both CPU and GPU architectures.
Contribution
It presents a novel fine-grained parallel execution method for Eager K-truss, enhancing load balancing and enabling efficient GPU implementation.
Findings
Up to 1.48x speedup on CPU
Up to 16.92x speedup on GPU
Improved load balancing and parallelism
Abstract
In this work we present a performance exploration on Eager K-truss, a linear-algebraic formulation of the K-truss graph algorithm. We address performance issues related to load imbalance of parallel tasks in symmetric, triangular graphs by presenting a fine-grained parallel approach to executing the support computation. This approach also increases available parallelism, making it amenable to GPU execution. We demonstrate our fine-grained parallel approach using implementations in Kokkos and evaluate them on an Intel Skylake CPU and an Nvidia Tesla V100 GPU. Overall, we observe between a 1.261. 48x improvement on the CPU and a 9.97-16.92x improvement on the GPU due to our fine-grained parallel formulation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
