NX-CGRA: A Programmable Hardware Accelerator for Core Transformer Algorithms on Edge Devices
Rohit Prasad

TL;DR
NX-CGRA is a flexible, programmable hardware accelerator tailored for diverse transformer workloads on edge devices, balancing performance and energy efficiency through a reconfigurable architecture.
Contribution
The paper presents NX-CGRA, a novel CGRA-based hardware accelerator that supports a wide range of transformer inference algorithms with software programmability.
Findings
High efficiency across various transformer kernels
Favorable energy-area tradeoffs demonstrated
Scalable for edge deployment under power constraints
Abstract
The increasing diversity and complexity of transformer workloads at the edge present significant challenges in balancing performance, energy efficiency, and architectural flexibility. This paper introduces NX-CGRA, a programmable hardware accelerator designed to support a range of transformer inference algorithms, including both linear and non-linear functions. Unlike fixed-function accelerators optimized for narrow use cases, NX-CGRA employs a coarse-grained reconfigurable array (CGRA) architecture with software-driven programmability, enabling efficient execution across varied kernel patterns. The architecture is evaluated using representative benchmarks derived from real-world transformer models, demonstrating high overall efficiency and favorable energy-area tradeoffs across different classes of operations. These results indicate the potential of NX-CGRA as a scalable and adaptable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Advanced Neural Network Applications
