Overcoming Limitations of GPGPU-Computing in Scientific Applications

Connor Kenyon; Glenn Volkema; Gaurav Khanna

arXiv:1905.05175·physics.comp-ph·May 15, 2019

Overcoming Limitations of GPGPU-Computing in Scientific Applications

Connor Kenyon, Glenn Volkema, Gaurav Khanna

PDF

Open Access

TL;DR

This paper investigates alternatives to PCIe bandwidth limitations in GPGPU computing, exploring NVIDIA NVLink and zero-copy algorithms to enhance data transfer efficiency in scientific applications.

Contribution

It introduces and evaluates two approaches—NVIDIA NVLink and zero-copy algorithms—for overcoming PCIe bandwidth constraints in GPGPU systems.

Findings

01

NVIDIA NVLink improves data transfer rates over PCIe.

02

Zero-copy algorithms reduce data transfer overhead.

03

Performance gains vary across different scientific kernels.

Abstract

The performance of discrete general purpose graphics processing units (GPGPUs) has been improving at a rapid pace. The PCIe interconnect that controls the communication of data between the system host memory and the GPU has not improved as quickly, leaving a gap in performance due to GPU downtime while waiting for PCIe data transfer. In this article, we explore two alternatives to the limited PCIe bandwidth, NVIDIA NVLink interconnect, and zero-copy algorithms for shared memory Heterogeneous System Architecture (HSA) devices. The OpenCL SHOC benchmark suite is used to measure the performance of each device on various scientific application kernels.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies