Optimizing GEMM for Energy and Performance on Versal ACAP Architectures

Ilias Papalamprou; Dimosthenis Masouros; Ioannis Loudaros; Francky Catthoor; Dimitrios Soudris

arXiv:2511.06907·cs.AR·November 11, 2025

Optimizing GEMM for Energy and Performance on Versal ACAP Architectures

Ilias Papalamprou, Dimosthenis Masouros, Ioannis Loudaros, Francky Catthoor, Dimitrios Soudris

PDF

Open Access

TL;DR

This paper presents an ML-guided framework for optimizing GEMM mappings on Versal ACAP architectures, significantly improving energy efficiency and performance for edge computing workloads.

Contribution

It introduces an automated, ML-based approach for mapping GEMM on Versal ACAP, addressing energy-performance trade-offs overlooked by prior methods.

Findings

01

Up to 2.5x throughput improvement

02

Up to 2.7x energy efficiency gain

03

Effective design space exploration with ML model

Abstract

General Matrix Multiplication (GEMM) is a fundamental operation in many scientific workloads, signal processing, and particularly deep learning. It is often a bottleneck for performance and energy efficiency, especially in edge environments with tight resource and power constraints. AMD's Versal ACAP offers heterogeneous components (AIEs, PL, PS) that can address these challenges, but mapping GEMM across them is complex, with prior works largely overlooking energy-performance trade-offs. In this paper, we propose an automated framework for Versal ACAP that generates GEMM mappings optimized for either performance or energy efficiency. Unlike prior analytical approaches, our method leverages a Machine Learning (ML) model, trained on approximately 6000 on-board experiments of different GEMM mappings, to guide Design Space Exploration, yielding more efficient designs. Evaluation on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Low-power high-performance VLSI design · Numerical Methods and Algorithms