CREW: Computation Reuse and Efficient Weight Storage for Hardware-accelerated MLPs and RNNs
Marc Riera, Jose-Maria Arnau, Antonio Gonzalez

TL;DR
CREW is a hardware accelerator designed to improve energy efficiency and speed for fully-connected layers in modern DNNs by exploiting repeated weights through computation reuse and efficient storage.
Contribution
CREW introduces a novel hardware architecture that leverages repeated weights in FC layers for computation reuse and reduced storage, outperforming prior techniques.
Findings
CREW achieves 2.61x speedup over TPU-like accelerators.
CREW reduces energy consumption by 2.42x.
CREW outperforms UCNN with 2.10x speedup.
Abstract
Deep Neural Networks (DNNs) have achieved tremendous success for cognitive applications. The core operation in a DNN is the dot product between quantized inputs and weights. Prior works exploit the weight/input repetition that arises due to quantization to avoid redundant computations in Convolutional Neural Networks (CNNs). However, in this paper we show that their effectiveness is severely limited when applied to Fully-Connected (FC) layers, which are commonly used in state-of-the-art DNNs, as it is the case of modern Recurrent Neural Networks (RNNs) and Transformer models. To improve energy-efficiency of FC computation we present CREW, a hardware accelerator that implements Computation Reuse and an Efficient Weight Storage mechanism to exploit the large number of repeated weights in FC layers. CREW first performs the multiplications of the unique weights by their respective inputs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Neural Network Applications · Advanced Memory and Neural Computing
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Layer Normalization · Softmax · Dense Connections
