CREW: Computation Reuse and Efficient Weight Storage for   Hardware-accelerated MLPs and RNNs

Marc Riera; Jose-Maria Arnau; Antonio Gonzalez

arXiv:2107.09408·cs.AR·March 14, 2022

CREW: Computation Reuse and Efficient Weight Storage for Hardware-accelerated MLPs and RNNs

Marc Riera, Jose-Maria Arnau, Antonio Gonzalez

PDF

Open Access

TL;DR

CREW is a hardware accelerator designed to improve energy efficiency and speed for fully-connected layers in modern DNNs by exploiting repeated weights through computation reuse and efficient storage.

Contribution

CREW introduces a novel hardware architecture that leverages repeated weights in FC layers for computation reuse and reduced storage, outperforming prior techniques.

Findings

01

CREW achieves 2.61x speedup over TPU-like accelerators.

02

CREW reduces energy consumption by 2.42x.

03

CREW outperforms UCNN with 2.10x speedup.

Abstract

Deep Neural Networks (DNNs) have achieved tremendous success for cognitive applications. The core operation in a DNN is the dot product between quantized inputs and weights. Prior works exploit the weight/input repetition that arises due to quantization to avoid redundant computations in Convolutional Neural Networks (CNNs). However, in this paper we show that their effectiveness is severely limited when applied to Fully-Connected (FC) layers, which are commonly used in state-of-the-art DNNs, as it is the case of modern Recurrent Neural Networks (RNNs) and Transformer models. To improve energy-efficiency of FC computation we present CREW, a hardware accelerator that implements Computation Reuse and an Efficient Weight Storage mechanism to exploit the large number of repeated weights in FC layers. CREW first performs the multiplications of the unique weights by their respective inputs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Advanced Neural Network Applications · Advanced Memory and Neural Computing

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Layer Normalization · Softmax · Dense Connections