Lifting to tensors when compiling scientific computing workloads for AI Engines
Nick Brown, Gabriel Rodriguez-Canal

TL;DR
This paper presents a compilation pipeline that lifts scientific computing code semantics into tensors, enabling efficient mapping to AMD's AI Engines with reduced code complexity and improved performance and energy efficiency.
Contribution
It introduces a tensor-based lifting approach for compiling scientific codes to AI Engines, simplifying porting and enhancing performance.
Findings
NPU performs comparably to multicore CPU for float32 kernels.
Two kernels show up to 40% performance improvement using combined CPU and NPU.
Energy consumption is reduced by 15% with the new approach.
Abstract
It has been demonstrated that specialised architectures, such as FPGAs and AMD's AI Engines (AIEs), have the potential to deliver energy and performance advantages for scientific computing. Given the integration of AIEs into AMD's CPUs, this is an interesting potential avenue especially when executing on the edge or making better use of local compute constrained resources. However, a major challenge is in enabling existing codes to run on this architecture without extensive modification. Put simply, it requires significant expertise and time to port codes to the AIE's execution model. In this paper we explore a compilation pipeline for efficiently mapping loops in general purpose, scientific codes to AIEs. Lifting the semantics of an application into tensors, we demonstrate that this is able to capture the intention of general purpose loops annotated with OpenMP and such high-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
