FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of   Quantized Neural Networks

Michaela Blott; Thomas Preusser; Nicholas Fraser; Giulio Gambardella,; Kenneth O'Brien; Yaman Umuroglu

arXiv:1809.04570·cs.AR·September 13, 2018·31 cites

FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks

Michaela Blott, Thomas Preusser, Nicholas Fraser, Giulio Gambardella,, Kenneth O'Brien, Yaman Umuroglu

PDF

Open Access

TL;DR

FINN-R is an automated FPGA-based framework that enables rapid exploration and deployment of quantized neural networks, achieving high throughput with reduced precision for embedded and cloud platforms.

Contribution

This paper introduces the second-generation FINN framework, which automates the design of custom low-precision neural network inference engines on FPGAs with formal resource and performance modeling.

Findings

01

Achieved 50 TOP/s throughput on AWS F1 platform.

02

Demonstrated high efficiency on embedded devices with 5 TOP/s.

03

Supported a range of neural networks from CIFAR-10 to YOLO.

Abstract

Convolutional Neural Networks have rapidly become the most successful machine learning algorithm, enabling ubiquitous machine vision and intelligent decisions on even embedded computing-systems. While the underlying arithmetic is structurally simple, compute and memory requirements are challenging. One of the promising opportunities is leveraging reduced-precision representations for inputs, activations and model parameters. The resulting scalability in performance, power efficiency and storage footprint provides interesting design compromises in exchange for a small reduction in accuracy. FPGAs are ideal for exploiting low-precision inference engines leveraging custom precisions to achieve the required numerical accuracy for a given application. In this article, we describe the second generation of the FINN framework, an end-to-end tool which enables design space exploration and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Machine Learning and Data Classification