DICE: Enabling Efficient General-Purpose SIMT Execution with Statically Scheduled Coarse-Grained Reconfigurable Arrays
Jiayi Wang, Ang Da Lu, Zhichen Zeng, Ang Li

TL;DR
DICE introduces a reconfigurable array-based architecture to replace SIMD units in GPUs, significantly reducing energy consumption while maintaining performance through static scheduling and innovative optimizations.
Contribution
It proposes a novel CGRA-based architecture with static scheduling and dynamic handling of runtime variability, achieving high energy efficiency in GPU-like workloads.
Findings
Reduces register file accesses by 68% on average.
Achieves 1.77-1.90x energy efficiency improvements.
Maintains comparable performance to traditional GPU architectures.
Abstract
While GPUs dominate massively parallel computing through the single-instruction, multiple-thread (SIMT) programming model, their underlying single-instruction, multiple-data (SIMD) execution incurs substantial energy overhead from frequent register file (RF) accesses and complex control logic. We present DICE, a novel architecture that addresses these inefficiencies by replacing the SIMD backend with minimal-overhead, statically scheduled coarse-grained reconfigurable arrays (CGRAs). Unlike SIMD units that execute warps of threads in lockstep, DICE dispatches active threads in a pipelined manner onto the CGRA fabric, where data flow directly between processing elements (PEs), reducing RF accesses for intermediate values. To handle operations with runtime dynamism, such as variable-latency memory loads and data-dependent control flow, while preserving static scheduling, DICE compiles…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
