# Overcoming Limitations of GPGPU-Computing in Scientific Applications

**Authors:** Connor Kenyon, Glenn Volkema, Gaurav Khanna

arXiv: 1905.05175 · 2019-05-15

## TL;DR

This paper investigates alternatives to PCIe bandwidth limitations in GPGPU computing, exploring NVIDIA NVLink and zero-copy algorithms to enhance data transfer efficiency in scientific applications.

## Contribution

It introduces and evaluates two approaches—NVIDIA NVLink and zero-copy algorithms—for overcoming PCIe bandwidth constraints in GPGPU systems.

## Key findings

- NVIDIA NVLink improves data transfer rates over PCIe.
- Zero-copy algorithms reduce data transfer overhead.
- Performance gains vary across different scientific kernels.

## Abstract

The performance of discrete general purpose graphics processing units (GPGPUs) has been improving at a rapid pace. The PCIe interconnect that controls the communication of data between the system host memory and the GPU has not improved as quickly, leaving a gap in performance due to GPU downtime while waiting for PCIe data transfer. In this article, we explore two alternatives to the limited PCIe bandwidth, NVIDIA NVLink interconnect, and zero-copy algorithms for shared memory Heterogeneous System Architecture (HSA) devices. The OpenCL SHOC benchmark suite is used to measure the performance of each device on various scientific application kernels.

---
Source: https://tomesphere.com/paper/1905.05175