Re-thinking Memory-Bound Limitations in CGRAs
Xiangfeng Liu, Zhe Jiang, Anzhen Zhu, Xiaomeng Han, Mingsong Lyu, Qingxu Deng, and Nan Guan

TL;DR
This paper introduces a redesigned memory subsystem for CGRAs that manages irregular memory accesses, significantly improving performance and reducing storage needs for complex workloads with irregular data patterns.
Contribution
It proposes a novel memory model and microarchitectural optimizations, including runahead execution and cache reconfiguration, to enhance CGRA performance on irregular memory access workloads.
Findings
Achieves 3.04x average speedup with runahead execution
Reduces storage size to 1.27% of original
Improves performance for irregular memory patterns
Abstract
Coarse-Grained Reconfigurable Arrays (CGRAs) are specialized accelerators commonly employed to boost performance in workloads with iterative structures. Existing research typically focuses on compiler or architecture optimizations aimed at improving CGRA performance, energy efficiency, flexibility, and area utilization, under the idealistic assumption that kernels can access all data from Scratchpad Memory (SPM). However, certain complex workloads-particularly in fields like graph analytics, irregular database operations, and specialized forms of high-performance computing (e.g., unstructured mesh simulations)-exhibit irregular memory access patterns that hinder CGRA utilization, sometimes dropping below 1.5%, making the CGRA memory-bound. To address this challenge, we conduct a thorough analysis of the underlying causes of performance degradation, then propose a redesigned memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
