Streaming Architecture for Large-Scale Quantized Neural Networks on an   FPGA-Based Dataflow Platform

Chaim Baskin; Natan Liss; Evgenii Zheltonozhskii; Alex M. Bronshtein,; Avi Mendelson

arXiv:1708.00052·cs.CV·May 20, 2019

Streaming Architecture for Large-Scale Quantized Neural Networks on an FPGA-Based Dataflow Platform

Chaim Baskin, Natan Liss, Evgenii Zheltonozhskii, Alex M. Bronshtein,, Avi Mendelson

PDF

TL;DR

This paper introduces a scalable streaming FPGA architecture for quantized neural networks, supporting skip connections and achieving high accuracy with significant power efficiency improvements over GPUs.

Contribution

The paper presents a novel FPGA-based streaming architecture for QNNs that scales across multiple FPGAs and efficiently supports skip connections, enabling high-performance, low-power neural network inference.

Findings

01

Achieved 57.5% top-1 accuracy with an 18-layer ResNet on FPGA.

02

Improved AlexNet accuracy from 41.8% to 51.03% using 2-bit activations.

03

ResNet-18 consumes 5x less power and is 4x slower than GPUs for ImageNet.

Abstract

Deep neural networks (DNNs) are used by different applications that are executed on a range of computer architectures, from IoT devices to supercomputers. The footprint of these networks is huge as well as their computational and communication needs. In order to ease the pressure on resources, research indicates that in many cases a low precision representation (1-2 bit per parameter) of weights and other parameters can achieve similar accuracy while requiring less resources. Using quantized values enables the use of FPGAs to run NNs, since FPGAs are well fitted to these primitives; e.g., FPGAs provide efficient support for bitwise operations and can work with arbitrary-precision representation of numbers. This paper presents a new streaming architecture for running QNNs on FPGAs. The proposed architecture scales out better than alternatives, allowing us to take advantage of systems…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAverage Pooling · Local Response Normalization · Grouped Convolution · Dropout · Dense Connections · Softmax · How do I speak to a person at Expedia?-/+/ · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization