CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with   Better-than-Binary Energy Efficiency

Moritz Scherer; Georg Rutishauser; Lukas Cavigelli; Luca Benini

arXiv:2011.01713·cs.AR·February 5, 2021

CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency

Moritz Scherer, Georg Rutishauser, Lukas Cavigelli, Luca Benini

PDF

TL;DR

This paper introduces CUTIE, a digital hardware accelerator for ternary neural networks that achieves unprecedented energy efficiency and performance, surpassing binary neural network accelerators in energy savings and accuracy.

Contribution

The paper presents a fully digital, unrolled hardware architecture for ternary neural networks that significantly reduces energy consumption and switching activity, with an optimized training method for higher sparsity.

Findings

01

Achieves 3.1 POp/s/W energy efficiency

02

Reduces inference energy cost by up to 21x

03

Maintains or improves accuracy compared to state-of-the-art

Abstract

We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks. CUTIE, the Completely Unrolled Ternary Inference Engine, focuses on minimizing non-computational energy and switching activity so that dynamic power spent on storing (locally or globally) intermediate results is minimized. This is achieved by 1) a data path architecture completely unrolled in the feature map and filter dimensions to reduce switching activity by favoring silencing over iterative computation and maximizing data re-use, 2) targeting ternary neural networks which, in contrast to binary NNs, allow for sparse weights which reduce switching activity, and 3) introducing an optimized training method for higher sparsity of the filter weights, resulting in a further reduction of the switching activity. Compared with state-of-the-art accelerators, CUTIE achieves greater or equal accuracy while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.