Methods for compressible fluid simulation on GPUs using high-order finite differences
Johannes Pekkil\"a (1), Miikka S. V\"ais\"al\"a (2), Maarit J., K\"apyl\"a (3,1), Petri J. K\"apyl\"a (4,1,3), Omer Anjum (5,1) ((1), ReSoLVE Center of Excellence, Aalto, (2) Department of Physics, University of, Helsinki, (3) Max-Planck-Institut f\"ur Sonnensystemforschung

TL;DR
This paper presents optimized GPU methods for high-order finite-difference fluid simulations, achieving significant speedups by reducing memory bandwidth and cache requirements through innovative kernel decomposition and cache blocking techniques.
Contribution
It introduces two GPU-based high-order finite-difference methods for compressible fluid simulation, optimizing memory usage and performance with novel kernel strategies.
Findings
Achieves 343 million grid points per second with bandwidth-bound implementation.
Provides a 3.6x speedup over CPU-based hydrodynamics solver.
Demonstrates effective reduction of memory bandwidth and cache demands.
Abstract
We focus on implementing and optimizing a sixth-order finite-difference solver for simulating compressible fluids on a GPU using third-order Runge-Kutta integration. Since graphics processing units perform well in data-parallel tasks, this makes them an attractive platform for fluid simulation. However, high-order stencil computation is memory-intensive with respect to both main memory and the caches of the GPU. We present two approaches for simulating compressible fluids using 55-point and 19-point stencils. We seek to reduce the requirements for memory bandwidth and cache size in our methods by using cache blocking and decomposing a latency-bound kernel into several bandwidth-bound kernels. Our fastest implementation is bandwidth-bound and integrates million grid points per second on a Tesla K40t GPU, achieving a speedup over a comparable hydrodynamics solver…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
