Loo.py: transformation-based code generation for GPUs and CPUs
Andreas Kl\"ockner

TL;DR
Loo.py is a Python-based system that simplifies generating high-performance code for diverse hardware by transforming array computations with a rich set of optimization techniques.
Contribution
It introduces a data model and transformation library for array computations, enabling flexible, high-performance code generation across CPUs and GPUs.
Findings
Supports a wide range of transformations like tiling, vectorization, and unrolling.
Facilitates gradual transition from prototype to optimized code.
Deep integration with numpy and PyOpenCL enhances usability.
Abstract
Today's highly heterogeneous computing landscape places a burden on programmers wanting to achieve high performance on a reasonably broad cross-section of machines. To do so, computations need to be expressed in many different but mathematically equivalent ways, with, in the worst case, one variant per target machine. Loo.py, a programming system embedded in Python, meets this challenge by defining a data model for array-style computations and a library of transformations that operate on this model. Offering transformations such as loop tiling, vectorization, storage management, unrolling, instruction-level parallelism, change of data layout, and many more, it provides a convenient way to capture, parametrize, and re-unify the growth among code variants. Optional, deep integration with numpy and PyOpenCL provides a convenient computing environment where the transition from prototype…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
