Exploration of Fine-Grained Parallelism for Load Balancing Eager K-truss   on GPU and CPU

Mark Blanco; Tze Meng Low; Kyungjoo Kim

arXiv:2009.07929·cs.DC·September 18, 2020

Exploration of Fine-Grained Parallelism for Load Balancing Eager K-truss on GPU and CPU

Mark Blanco, Tze Meng Low, Kyungjoo Kim

PDF

TL;DR

This paper introduces a fine-grained parallel approach for the Eager K-truss algorithm, significantly improving load balancing and performance on both CPU and GPU architectures.

Contribution

It presents a novel fine-grained parallel execution method for Eager K-truss, enhancing load balancing and enabling efficient GPU implementation.

Findings

01

Up to 1.48x speedup on CPU

02

Up to 16.92x speedup on GPU

03

Improved load balancing and parallelism

Abstract

In this work we present a performance exploration on Eager K-truss, a linear-algebraic formulation of the K-truss graph algorithm. We address performance issues related to load imbalance of parallel tasks in symmetric, triangular graphs by presenting a fine-grained parallel approach to executing the support computation. This approach also increases available parallelism, making it amenable to GPU execution. We demonstrate our fine-grained parallel approach using implementations in Kokkos and evaluate them on an Intel Skylake CPU and an Nvidia Tesla V100 GPU. Overall, we observe between a 1.261. 48x improvement on the CPU and a 9.97-16.92x improvement on the GPU due to our fine-grained parallel formulation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.