TL;DR
NEURA is a compilation framework that transforms control flow in CGRA kernels into a unified dataflow IR, enabling retargeting across architectures and achieving significant performance improvements.
Contribution
It introduces a novel dataflow IR with embedded control predicates, allowing systematic flattening of control flow and retargeting to diverse CGRAs.
Findings
2.20x speedup on kernel benchmarks with high-performance CGRA
Up to 2.71x geometric mean speedup on real-world applications
Competitive performance on low-power CGRA architectures
Abstract
Coarse-Grained Reconfigurable Architectures (CGRAs) are a promising and versatile accelerator platform, offering a balance between the performance and efficiency of specialized accelerators and the software programmability. However, their full potential is severely hindered by control flow in accelerated kernels, as the control flow (e.g., loops, branches) is fundamentally incompatible with the parallel, data-driven CGRA fabric. Prior strategies to resolve this mismatch in CGRA kernel acceleration are either inefficient, sacrificing performance for generality, or lack generality due to the difficulty of adapting them across different execution models. Thus, a general and unified solution for efficient CGRA kernel acceleration remains elusive. This paper introduces NEURA, a unified and retargetable compilation framework that systematically resolves the control-dataflow mismatch in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
