Demystifying the 7-D Convolution Loop Nest for Data and Instruction Streaming in Reconfigurable AI Accelerators

Md Rownak Hossain Chowdhury; Mostafizur Rahman

arXiv:2507.20420·cs.AR·July 29, 2025

Demystifying the 7-D Convolution Loop Nest for Data and Instruction Streaming in Reconfigurable AI Accelerators

Md Rownak Hossain Chowdhury, Mostafizur Rahman

PDF

TL;DR

This paper introduces a hardware-centric framework to efficiently implement 7-D convolution loops in reconfigurable AI accelerators, reducing control overhead and improving data reuse for high-performance neural network inference.

Contribution

It reinterprets the 7-D convolution loop nest as a data and instruction streaming problem, enabling flexible, lightweight deployment on reconfigurable hardware without heavy transformations.

Findings

01

Over 90% PE utilization in MAVeC accelerator

02

Achieved 1.56 TFLOPs/sec throughput for VGG-16

03

Supported full VGG-16 inference with scalable performance

Abstract

Convolution remains the most compute-intensive operation in AI acceleration, often constituting over 80-90% of the workload. Existing approaches in spatial architectures such as coarse-grained reconfigurable arrays (CGRAs) and field-programmable gate arrays (FPGAs) frequently rely on loop unrolling or GEMM-based matrix transformations, introducing significant overhead in both data movement and instruction control. This paper presents a new framework designed to systematically demystify the 7-dimensional convolution loop nest by reinterpreting it as a hardware-centric data and instruction streaming problem. Instead of treating the loop nest as a fixed computational construct, our approach exposes its structure as a set of spatial and temporal mappings governed by hardware parameters such as compute element distribution, interconnect topology, and reconfigurability. This abstraction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.