Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly   Compression

Jingcheng Shen; Yifan Wu; Masao Okita; Fumihiko Ino

arXiv:2109.05410·cs.DC·September 14, 2021

Accelerating GPU-Based Out-of-Core Stencil Computation with On-the-Fly Compression

Jingcheng Shen, Yifan Wu, Masao Okita, Fumihiko Ino

PDF

Open Access

TL;DR

This paper presents a novel on-the-fly compression technique to accelerate GPU-based out-of-core stencil computations, reducing data transfer bottlenecks and achieving significant speedups with minimal precision loss over many time steps.

Contribution

It introduces a new compression method that handles data dependencies and supports pipelining, enhancing GPU out-of-core stencil computation performance.

Findings

01

Achieved a 1.2x speedup over non-compression methods.

02

Maintained negligible precision loss up to 4,320 time steps.

03

Supported overlapping data transfer with computation through modified GPU compression library.

Abstract

Stencil computation is an important class of scientific applications that can be efficiently executed by graphics processing units (GPUs). Out-of-core approach helps run large scale stencil codes that process data with sizes larger than the limited capacity of GPU memory. However, the performance of the GPU-based out-of-core stencil computation is always limited by the data transfer between the CPU and GPU. Many optimizations have been explored to reduce such data transfer, but the study on the use of on-the-fly compression techniques is far from sufficient. In this study, we propose a method that accelerates the GPU-based out-of-core stencil computation with on-the-fly compression. We introduce a novel data compression approach that solves the data dependency between two contiguous decomposed data blocks. We also modify a widely used GPU-based compression library to support pipelining…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Algorithms and Data Compression · Parallel Computing and Optimization Techniques