TL;DR
This paper introduces a highly efficient matrix-free algorithm for evaluating discontinuous Galerkin operators on quadrilateral and hexahedral meshes, optimizing sum factorization kernels for modern processors.
Contribution
The paper develops and analyzes a novel, optimized sum factorization-based framework for fast matrix-free DG operator evaluation, including implementation details and performance benchmarking.
Findings
Achieves up to 60% of arithmetic peak on modern CPUs.
Performance is often within 10% of memory bandwidth limits.
Full operator evaluation is bandwidth-bound, influenced by data movement and communication.
Abstract
We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators based on sum factorization on quadrilateral and hexahedral meshes. We identify a set of kernels for fast quadrature on cells and faces targeting a wide class of weak forms originating from linear and nonlinear partial differential equations. Different algorithms and data structures for the implementation of operator evaluation are compared in an in-depth performance analysis. The sum factorization kernels are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional compute kernels. In isolation our implementation then reaches up to 60\% of arithmetic peak on Intel Haswell and Broadwell processors and up to 50\% of arithmetic peak on Intel Knights Landing. The full operator evaluation reaches only about half that throughput…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
