Exploiting pre-optimized kernels with polyhedral transformations for CGRA compilation
Yuxuan Wang, Mar\'ia Jos\'e Belda, Fernando Castro, Katzalin Olcoz, David Atienza, Giovanni Ansaloni

TL;DR
This paper presents a novel compilation approach that uses polyhedral transformations to optimize matrix multiplication kernels on CGRAs, achieving significant speedups by exposing hidden parallelism.
Contribution
It introduces a specialized mmul kernel schedule and a compilation methodology that leverages polyhedral analysis to improve CGRA performance.
Findings
Achieved up to 9.1x speedup on benchmarks with hidden mmuls.
Effectively exposes parallelism in complex computational patterns.
Maximizes resource utilization and runtime performance.
Abstract
Modern computing workloads commonly involve matrix-matrix multiplication (mmul) as a core computing pattern. Coarse-Grained Reconfigurable Arrays (CGRAs) can flexibly and efficiently support it, since they combine operation-level reconfigurability and high energy efficiency. However, mapping computational kernels that include mmul with state-of-the-art compilation strategies often leads to suboptimal results, since its multi-dimensional structure hampers the uncovering of its inherent parallelism and, ultimately, runtime performance. Here, we take a different position: we introduce a specialized mmul CGRA kernel schedule, parametrizable across different CGRA sizes. Then, we describe a novel compilation methodology that adapts program representations to effectively leverage it, employing polyhedral transformations to analyze complex computational patterns and expose hidden mmul…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
