TL;DR
This paper focuses on optimizing GPU implementations of tensor-product operators used in high-order finite element methods, providing a performance model and optimization strategies to enhance computational efficiency.
Contribution
It introduces a detailed optimization approach and a performance model for tensor-product operators in finite element methods on GPUs, addressing low arithmetic intensity challenges.
Findings
Achieved near-peak GPU performance for tensor-product operators.
Developed a performance model calibrated with empirical data.
Identified key optimization strategies for low arithmetic intensity kernels.
Abstract
This paper is devoted to GPU kernel optimization and performance analysis of three tensor-product operators arising in finite element methods. We provide a mathematical background to these operations and implementation details. Achieving close-to-the-peak performance for these operators requires extensive optimization because of the operators' properties: low arithmetic intensity, tiered structure, and the need to store intermediate results inside the kernel. We give a guided overview of optimization strategies and we present a performance model that allows us to compare the efficacy of these optimizations against an empirically calibrated roofline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
