FantastIC4: A Hardware-Software Co-Design Approach for Efficiently Running 4bit-Compact Multilayer Perceptrons
Simon Wiedemann, Suhas Shivapakash, Pablo Wiedemann, Daniel Becking,, Wojciech Samek, Friedel Gerfers, Thomas Wiegand

TL;DR
This paper introduces FantastIC4, a hardware-software co-design for efficient 4-bit compressed neural network inference, achieving high throughput and power efficiency on edge devices through novel architecture and training methods.
Contribution
It presents a new hardware architecture and entropy-constrained training method enabling efficient 4-bit quantized multilayer perceptrons for edge AI applications.
Findings
Achieves 2.45 TOPS throughput with 3.6W power on FPGA
Attains 20.17 TOPS/W power efficiency on ASIC
Outperforms state-of-the-art accelerators by 51x in throughput
Abstract
With the growing demand for deploying deep learning models to the "edge", it is paramount to develop techniques that allow to execute state-of-the-art models within very tight and limited resource constraints. In this work we propose a software-hardware optimization paradigm for obtaining a highly efficient execution engine of deep neural networks (DNNs) that are based on fully-connected layers. Our approach is centred around compression as a means for reducing the area as well as power requirements of, concretely, multilayer perceptrons (MLPs) with high predictive performances. Firstly, we design a novel hardware architecture named FantastIC4, which (1) supports the efficient on-chip execution of multiple compact representations of fully-connected layers and (2) minimizes the required number of multipliers for inference down to only 4 (thus the name). Moreover, in order to make the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
