Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine
Renzo Andri, Lukas Cavigelli, Davide Rossi, Luca Benini

TL;DR
Hyperdrive is a scalable, multi-chip BWN inference engine that significantly reduces I/O bandwidth and achieves high system-level efficiency for ultra-low power devices by using a novel streaming approach and systolic chip arrangement.
Contribution
It introduces Hyperdrive, a novel multi-chip BWN accelerator with a binary-weight streaming method and systolic architecture, enhancing system-level efficiency and scalability.
Findings
Achieves 4.3 TOp/s/W system efficiency, 3.1x higher than prior accelerators.
Supports arbitrarily sized CNNs and input resolutions with a scalable design.
Uses resource-intensive FP16 arithmetic for robustness without sacrificing efficiency.
Abstract
Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute and memory intensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend, pushing weight quantization to the limit. Hardware accelerators for BWNs presented up to now have focused on core efficiency, disregarding I/O bandwidth and system-level efficiency that are crucial for deployment of accelerators in ultra-low power devices. We present Hyperdrive: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for arbitrarily sized convolutional neural network architecture and input resolution by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
