Array Program Transformation with Loo.py by Example: High-Order Finite Elements
Andreas Kl\"ockner, Lucas C. Wilcox, T. Warburton

TL;DR
This paper demonstrates how Loo.py can transform real-world Fortran code into high-performance GPU kernels through various code transformations, showcasing its effectiveness for high-order finite element computations.
Contribution
It introduces a systematic transformation process from Fortran subroutines to GPU kernels using Loo.py, highlighting novel mechanized code conversion techniques.
Findings
Achieved kernel fusion, vectorization, and parallelization.
Demonstrated significant performance improvements.
Validated transformations on real-world weather model code.
Abstract
To concisely and effectively demonstrate the capabilities of our program transformation system Loo.py, we examine a transformation path from two real-world Fortran subroutines as found in a weather model to a single high-performance computational kernel suitable for execution on modern GPU hardware. Along the transformation path, we encounter kernel fusion, vectorization, prefetch- ing, parallelization, and algorithmic changes achieved by mechanized conversion between imperative and functional/substitution- based code, among a number more. We conclude with performance results that demonstrate the effects and support the effectiveness of the applied transformations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
