AEG: A Baremetal Framework for AI Acceleration via Direct Hardware Access in Heterogeneous Accelerators
Hua Jiang, Sayan Mandal, Brandon Kirincich, Govind Varadarajan

TL;DR
This paper presents a hardware-independent baremetal runtime architecture for high-performance ML inference on heterogeneous accelerators, eliminating OS overhead and improving efficiency.
Contribution
It introduces a novel control-as-data paradigm with a minimal runtime hardware abstraction layer for efficient AI acceleration without an OS.
Findings
9.2× higher compute efficiency compared to Linux-based deployment
3–7× reduction in data movement overhead
Achieves 68.78% Top-1 accuracy on ImageNet with 28 AIE tiles
Abstract
This paper introduces a unified, hardware-independent baremetal runtime architecture designed to enable high-performance machine learning (ML) inference on heterogeneous accelerators, such as AI Engine (AIE) arrays, without the overhead of an underlying real-time or general-purpose operating system. Existing edge-deployment frameworks, such as TinyML, often rely on real-time operating systems (RTOS), which introduce unnecessary complexity and performance bottlenecks. To address this, our solution fundamentally decouples the runtime from hardware specifics by flattening complex control logic into linear, executable Runtime Control Blocks (RCBs). This "Control as Data" paradigm allows high-level models, including Adaptive Data Flow (ADF) graphs, to be executed by a generic engine through a minimal Runtime Hardware Abstraction Layer (RHAL). We further integrate Runtime Platform Management…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
