GPU Support for Automatic Generation of Finite-Differences Stencil   Kernels

Vitor Hugo Mickus Rodrigues; Lucas Cavalcante; Maelso Bruno Pereira,; Fabio Luporini; Istv\'an Reguly; Gerard Gorman; Samuel Xavier de Souza

arXiv:1912.00695·cs.DC·August 5, 2020

GPU Support for Automatic Generation of Finite-Differences Stencil Kernels

Vitor Hugo Mickus Rodrigues, Lucas Cavalcante, Maelso Bruno Pereira,, Fabio Luporini, Istv\'an Reguly, Gerard Gorman, Samuel Xavier de Souza

PDF

TL;DR

This paper introduces an extension to the Devito compiler that automatically generates optimized GPU kernels for finite-difference stencil computations, significantly improving performance for seismic inversion algorithms.

Contribution

It presents a high-level symbolic code generation approach for GPU-accelerated finite-difference kernels, bridging the gap between symbolic programming and high-performance GPU execution.

Findings

01

Achieves 63% of V100's peak performance on GPU

02

Achieves 24% of Titan Z's peak performance on GPU

03

Memory optimization is key to performance improvements

Abstract

The growth of data to be processed in the Oil & Gas industry matches the requirements imposed by evolving algorithms based on stencil computations, such as Full Waveform Inversion and Reverse Time Migration. Graphical processing units (GPUs) are an attractive architectural target for stencil computations because of its high degree of data parallelism. However, the rapid architectural and technological progression makes it difficult for even the most proficient programmers to remain up-to-date with the technological advances at a micro-architectural level. In this work, we present an extension for an open source compiler designed to produce highly optimized finite difference kernels for use in inversion methods named Devito. We embed it with the Oxford Parallel Domain Specific Language (OP-DSL) in order to enable automatic code generation for GPU architectures from a high-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.