Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator   for Mobile CNN Inference

Zhi-Gang Liu; Paul N. Whatmough; Matthew Mattina

arXiv:2005.08098·cs.DC·May 19, 2020·1 cites

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference

Zhi-Gang Liu, Paul N. Whatmough, Matthew Mattina

PDF

Open Access

TL;DR

This paper introduces the Systolic Tensor Array (STA), an optimized hardware architecture for CNN inference on mobile devices, featuring tensor processing elements and support for block-sparse data formats to improve efficiency and reduce power consumption.

Contribution

It generalizes the traditional systolic array into a tensor-based architecture and supports a novel block-sparse format, achieving significant improvements in area and power efficiency.

Findings

01

STA reduces circuit area by up to 2.08x compared to traditional SA.

02

STA-DBB achieves up to 3.14x area and 1.97x power improvements over baseline.

03

Supports dense and sparse models with high efficiency.

Abstract

Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). The systolic array (SA) is a pipelined 2D array of processing elements (PEs), with very efficient local data movement, well suited to accelerating GEMM, and widely deployed in industry. In this work, we describe two significant improvements to the traditional SA architecture, to specifically optimize for CNN inference. Firstly, we generalize the traditional scalar PE, into a Tensor-PE, which gives rise to a family of new Systolic Tensor Array (STA) microarchitectures. The STA family increases intra-PE operand reuse and datapath efficiency, resulting in circuit area and power dissipation reduction of as much as 2.08x and 1.36x respectively, compared to the conventional SA at iso-throughput with INT8 operands. Secondly, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Tensor decomposition and applications · Advanced Memory and Neural Computing