AIE4ML: An End-to-End Framework for Compiling Neural Networks for the Next Generation of AMD AI Engines

Dimitrios Danopoulos; Enrico Lupi; Chang Sun; Sebastian Dittmeier; Michael Kagan; Vladimir Loncar; Maurizio Pierini

arXiv:2512.15946·cs.LG·January 19, 2026

AIE4ML: An End-to-End Framework for Compiling Neural Networks for the Next Generation of AMD AI Engines

Dimitrios Danopoulos, Enrico Lupi, Chang Sun, Sebastian Dittmeier, Michael Kagan, Vladimir Loncar, Maurizio Pierini

PDF

Open Access

TL;DR

AIE4ML is a comprehensive framework that automatically compiles neural networks into optimized firmware for AMD's next-generation AI engines, enabling high efficiency, scalability, and compatibility for ultra-low-latency AI inference.

Contribution

It introduces the first end-to-end compilation framework for AIE-ML devices, supporting multi-layer models, deterministic placement, and seamless integration with high-level tools.

Findings

01

Achieves up to 98.6% efficiency relative to single-kernel baseline.

02

Utilizes 97.4% of AIE tiles with entirely on-chip data movement.

03

Delivers GPU-class throughput with microsecond latency.

Abstract

Efficient AI inference on AMD's Versal AI Engine (AIE) is challenging due to tightly coupled VLIW execution, explicit datapaths, and local memory management. Prior work focused on first-generation AIE kernel optimizations, without tackling full neural network execution across the 2D array. In this work, we present AIE4ML, the first comprehensive framework for converting AI models automatically into optimized firmware targeting the AIE-ML generation devices, also with forward compatibility for the newer AIE-MLv2 architecture. At the single-kernel level, we attain performance close to the architectural peak. At the graph and system levels, we provide a structured parallelization method that can scale across the 2D AIE-ML fabric and exploit its dedicated memory tiles to stay entirely on-chip throughout the model execution. As a demonstration, we designed a generalized and highly efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Big Data and Digital Economy