Hyperbolic Diffusion in Flux Reconstruction: Optimisation through Kernel Fusion within Tensor-Product Elements
Will Trojak, Rob Watson, and Freddie Witherden

TL;DR
This paper introduces hyperbolic diffusion techniques for flux reconstruction that enable GPU kernel fusion, significantly improving computational speed and efficiency in flux reconstruction methods.
Contribution
It presents novel hyperbolic diffusion methods for GPU kernel fusion in flux reconstruction, with optimized approaches and demonstrated speedups in 3D simulations.
Findings
Achieved 3-4 times kernel speedup.
Reduced total runtime by approximately 25%.
Demonstrated 2.3 times speedup over standard ACM.
Abstract
Novel methods are presented in this initial study for the fusion of GPU kernels in the artificial compressibility method (ACM), using tensor product elements with constant Jacobians and flux reconstruction. This is made possible through the hyperbolisation of the diffusion terms, which eliminates the expensive algorithmic steps needed to form the viscous stresses. Two fusion approaches are presented, which offer differing levels of parallelism. This is found to be necessary for the change in workload as the order of accuracy of the elements is increased. Several further optimisations of these approaches are demonstrated, including a generation time memory manager which maximises resource usage. The fused kernels are able to achieve 3-4 times speedup, which compares favourably with a theoretical maximum speedup of 4. In three dimensional test cases, the generated fused kernels are found…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
