FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN   Model Training

Sangkug Lym; Mattan Erez

arXiv:2004.13027·cs.LG·April 29, 2020·5 cites

FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN Model Training

Sangkug Lym, Mattan Erez

PDF

Open Access

TL;DR

FlexSA introduces a reconfigurable systolic array architecture that enhances efficiency in pruned DNN training by dynamically adapting to tensor sizes, leading to significant improvements in resource utilization and energy savings.

Contribution

This paper presents FlexSA, a novel flexible systolic array design with dynamic reconfiguration and a compilation heuristic, optimizing pruning and training of DNNs.

Findings

01

37% better resource utilization over conventional accelerators

02

1.7X improved on-chip data reuse

03

28% energy savings

Abstract

Modern deep learning models have high memory and computation cost. To make them fast and memory-cost efficient, structured model pruning is commonly used. We find that pruning a model using a common training accelerator with large systolic arrays is extremely performance-inefficient. To make a systolic array efficient for pruning and training, we propose FlexSA, a flexible systolic array architecture. FlexSA dynamically reconfigures the systolic array structure and offers multiple sub-systolic operating modes, which are designed for energy- and memory bandwidth-efficient processing of tensors with different sizes and shapes. We also present a compilation heuristic for tiling matrix-multiplication-and-accumulation operations in a training workload to best utilize the resources of FlexSA. Based on our evaluation, FlexSA with the proposed compilation heuristic improves compute resource…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Tensor decomposition and applications

MethodsPruning