Design Rules for Extreme-Edge Scientific Computing on AI Engines

Zhenghua Ma; G Abarajithan; Dimitrios Danopoulos; Olivia Weng; Francesco Restuccia; Ryan Kastner

arXiv:2604.19106·cs.AR·April 22, 2026

Design Rules for Extreme-Edge Scientific Computing on AI Engines

Zhenghua Ma, G Abarajithan, Dimitrios Danopoulos, Olivia Weng, Francesco Restuccia, Ryan Kastner

PDF

TL;DR

This paper evaluates the use of AI Engines versus programmable logic for extreme-edge scientific neural networks, providing architectural insights, optimization strategies, and demonstrating successful deployments on FPGA SoCs.

Contribution

It introduces the LARE metric for comparing AI Engines and programmable logic, and offers dataflow optimizations for low-latency scientific inference.

Findings

01

AI Engines outperform programmable logic for larger models based on LARE metric.

02

Spatial and API-level dataflow optimizations improve inference latency.

03

End-to-end neural networks that don't fit on programmable logic can be deployed on AI Engines.

Abstract

Extreme-edge scientific applications use machine learning models to analyze sensor data and make real-time decisions. Their stringent latency and throughput requirements demand small batch sizes and require that model weights remain fully on-chip. Spatial dataflow implementations are common for extreme-edge applications. Spatial dataflow works well for small networks, but it fails to scale to larger models due to inherent resource scaling limitations. AI Engines on modern FPGA SoCs offer a promising alternative with high compute density and additional on-chip memory. However, the architecture, programming model, and performance-scaling behavior of AI Engines differ fundamentally from those of the programmable logic, making direct comparison non-trivial and the benefits of using AI Engines unclear. This work addresses how and when extreme-edge scientific neural networks should be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.