Streaming Architecture for Large-Scale Quantized Neural Networks on an FPGA-Based Dataflow Platform
Chaim Baskin, Natan Liss, Evgenii Zheltonozhskii, Alex M. Bronshtein,, Avi Mendelson

TL;DR
This paper introduces a scalable streaming FPGA architecture for quantized neural networks, supporting skip connections and achieving high accuracy with significant power efficiency improvements over GPUs.
Contribution
The paper presents a novel FPGA-based streaming architecture for QNNs that scales across multiple FPGAs and efficiently supports skip connections, enabling high-performance, low-power neural network inference.
Findings
Achieved 57.5% top-1 accuracy with an 18-layer ResNet on FPGA.
Improved AlexNet accuracy from 41.8% to 51.03% using 2-bit activations.
ResNet-18 consumes 5x less power and is 4x slower than GPUs for ImageNet.
Abstract
Deep neural networks (DNNs) are used by different applications that are executed on a range of computer architectures, from IoT devices to supercomputers. The footprint of these networks is huge as well as their computational and communication needs. In order to ease the pressure on resources, research indicates that in many cases a low precision representation (1-2 bit per parameter) of weights and other parameters can achieve similar accuracy while requiring less resources. Using quantized values enables the use of FPGAs to run NNs, since FPGAs are well fitted to these primitives; e.g., FPGAs provide efficient support for bitwise operations and can work with arbitrary-precision representation of numbers. This paper presents a new streaming architecture for running QNNs on FPGAs. The proposed architecture scales out better than alternatives, allowing us to take advantage of systems…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAverage Pooling · Local Response Normalization · Grouped Convolution · Dropout · Dense Connections · Softmax · How do I speak to a person at Expedia?-/+/ · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization
