GPU performance analysis of a nodal discontinuous Galerkin method for   acoustic and elastic models

Axel Modave; Amik St-Cyr; Tim Warburton

arXiv:1602.07997·physics.comp-ph·April 20, 2016·Comput. Geosci.

GPU performance analysis of a nodal discontinuous Galerkin method for acoustic and elastic models

Axel Modave, Amik St-Cyr, Tim Warburton

PDF

TL;DR

This paper evaluates GPU implementations of a nodal discontinuous Galerkin method for acoustic and elastic models, comparing specialized kernels and BLAS-based strategies on an NVIDIA GTX980, highlighting performance trade-offs.

Contribution

It provides a comprehensive performance analysis of different GPU implementation strategies for the discontinuous Galerkin method, including a unified testing framework and insights into their efficiency.

Findings

01

Specialized kernels are effective up to certain polynomial degrees.

02

Using cuBLAS yields better performance for higher polynomial degrees.

03

Achieved up to 35.7% of the GPU's theoretical peak arithmetic throughput.

Abstract

Finite element schemes based on discontinuous Galerkin methods possess features amenable to massively parallel computing accelerated with general purpose graphics processing units (GPUs). However, the computational performance of such schemes strongly depends on their implementation. In the past, several implementation strategies have been proposed. They are based exclusively on specialized compute kernels tuned for each operation, or they can leverage BLAS libraries that provide optimized routines for basic linear algebra operations. In this paper, we present and analyze up-to-date performance results for different implementations, tested in a unified framework on a single NVIDIA GTX980 GPU. We show that specialized kernels written with a one-node-per-thread strategy are competitive for polynomial bases up to the fifth and seventh degrees for acoustic and elastic models, respectively.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.