ReDas: A Lightweight Architecture for Supporting Fine-Grained Reshaping   and Multiple Dataflows on Systolic Array

Meng Han; Liang Wang; Limin Xiao; Tianhao Cai; Zeyu Wang; Xiangrong; Xu; Chenhao Zhang

arXiv:2302.07520·cs.AR·May 16, 2024

ReDas: A Lightweight Architecture for Supporting Fine-Grained Reshaping and Multiple Dataflows on Systolic Array

Meng Han, Liang Wang, Limin Xiao, Tianhao Cai, Zeyu Wang, Xiangrong, Xu, Chenhao Zhang

PDF

Open Access

TL;DR

ReDas introduces a lightweight, reconfigurable systolic array architecture that enables fine-grained reshaping and multiple dataflows, significantly improving efficiency and utilization for DNN acceleration.

Contribution

It presents a novel, lightweight design supporting dynamic fine-grained reshaping and multiple dataflows with minimal hardware overhead.

Findings

01

Achieves 4.6x speedup over conventional systolic arrays.

02

Reduces energy-delay product by 8.3x.

03

Supports up to 129 different logical shapes and 3 dataflows.

Abstract

The systolic accelerator is one of the premier architectural choices for DNN acceleration. However, the conventional systolic architecture suffers from low PE utilization due to the mismatch between the fixed array and diverse DNN workloads. Recent studies have proposed flexible systolic array architectures to adapt to DNN models. However, these designs support only coarse-grained reshaping or significantly increase hardware overhead. In this study, we propose ReDas, a flexible and lightweight systolic array that supports dynamic fine-grained reshaping and multiple dataflows. First, ReDas integrates lightweight and reconfigurable roundabout data paths, which achieve fine-grained reshaping using only short connections between adjacent PEs. Second, we redesign the PE microarchitecture and integrate a set of multi-mode data buffers around the array. The PE structure enables additional data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Advanced Memory and Neural Computing