GPU Performance of an Entropy-Stable Discontinuous Galerkin Euler Solver with Non-Conservative Terms

Henry Waterhouse; Maciej Waruszewski; Lucas C. Wilcox; Francis X. Giraldo

arXiv:2605.16684·math.NA·May 19, 2026

GPU Performance of an Entropy-Stable Discontinuous Galerkin Euler Solver with Non-Conservative Terms

Henry Waterhouse, Maciej Waruszewski, Lucas C. Wilcox, Francis X. Giraldo

PDF

TL;DR

This paper presents an GPU implementation of an entropy-stable discontinuous Galerkin solver for Euler equations, demonstrating high performance, scalability, and energy efficiency on NVIDIA A100 hardware.

Contribution

The paper introduces a GPU-optimized entropy-stable DG solver for Euler equations with non-conservative terms, achieving significant speedup and efficiency improvements.

Findings

01

GPU solver reaches 70% of peak performance on A100 hardware.

02

GPU kernels are 10x faster and 13x more energy-efficient than CPU code.

03

Solver achieves 2x speedup at 32-bit precision.

Abstract

The entropy-stable discontinuous Galerkin method for compressible Euler equations with buoyancy is implemented on graphics processing unit (GPU) hardware. We measure the performance of the solver on three-dimensional problems: the rising thermal bubble and the baroclinic instability in a channel. On NVIDIA A100 hardware, the solver achieves nearly 70\% of 64-bit floating-point peak performance for the most computationally expensive kernel (volume terms) and significantly reduces the computational overhead typically incurred by two point entropy-stable fluxes in the volume terms. We also present impressive strong and weak scaling performance of the solver and compare to a highly-optimized central processing unit (CPU) code showing that the GPU kernels are a factor of $10 \times$ faster and better than $13 \times$ more energy efficient than the CPU code. We also show that the solver…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.