FantastIC4: A Hardware-Software Co-Design Approach for Efficiently   Running 4bit-Compact Multilayer Perceptrons

Simon Wiedemann; Suhas Shivapakash; Pablo Wiedemann; Daniel Becking,; Wojciech Samek; Friedel Gerfers; Thomas Wiegand

arXiv:2012.11331·cs.AR·December 22, 2020

FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4bit-Compact Multilayer Perceptrons

Simon Wiedemann, Suhas Shivapakash, Pablo Wiedemann, Daniel Becking,, Wojciech Samek, Friedel Gerfers, Thomas Wiegand

PDF

TL;DR

This paper introduces FantastIC4, a hardware-software co-design for efficient 4-bit compressed neural network inference, achieving high throughput and power efficiency on edge devices through novel architecture and training methods.

Contribution

It presents a new hardware architecture and entropy-constrained training method enabling efficient 4-bit quantized multilayer perceptrons for edge AI applications.

Findings

01

Achieves 2.45 TOPS throughput with 3.6W power on FPGA

02

Attains 20.17 TOPS/W power efficiency on ASIC

03

Outperforms state-of-the-art accelerators by 51x in throughput

Abstract

With the growing demand for deploying deep learning models to the "edge", it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm for obtaining a highly efficient execution engine of deep neural networks (DNNs) that are based on fully-connected layers. Our approach is centred around compression as a means for reducing the area as well as power requirements of, concretely, multilayer perceptrons (MLPs) with high predictive performances. Firstly, we design a novel hardware architecture named FantastIC4, which (1) supports the efficient on-chip execution of multiple compact representations of fully-connected layers and (2) minimizes the required number of multipliers for inference down to only 4 (thus the name). Moreover, in order to make the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.