Optimizing GEMM for Energy and Performance on Versal ACAP Architectures
Ilias Papalamprou, Dimosthenis Masouros, Ioannis Loudaros, Francky Catthoor, Dimitrios Soudris

TL;DR
This paper presents an ML-guided framework for optimizing GEMM mappings on Versal ACAP architectures, significantly improving energy efficiency and performance for edge computing workloads.
Contribution
It introduces an automated, ML-based approach for mapping GEMM on Versal ACAP, addressing energy-performance trade-offs overlooked by prior methods.
Findings
Up to 2.5x throughput improvement
Up to 2.7x energy efficiency gain
Effective design space exploration with ML model
Abstract
General Matrix Multiplication (GEMM) is a fundamental operation in many scientific workloads, signal processing, and particularly deep learning. It is often a bottleneck for performance and energy efficiency, especially in edge environments with tight resource and power constraints. AMD's Versal ACAP offers heterogeneous components (AIEs, PL, PS) that can address these challenges, but mapping GEMM across them is complex, with prior works largely overlooking energy-performance trade-offs. In this paper, we propose an automated framework for Versal ACAP that generates GEMM mappings optimized for either performance or energy efficiency. Unlike prior analytical approaches, our method leverages a Machine Learning (ML) model, trained on approximately 6000 on-board experiments of different GEMM mappings, to guide Design Space Exploration, yielding more efficient designs. Evaluation on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Low-power high-performance VLSI design · Numerical Methods and Algorithms
