FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks
Michaela Blott, Thomas Preusser, Nicholas Fraser, Giulio Gambardella,, Kenneth O'Brien, Yaman Umuroglu

TL;DR
FINN-R is an automated FPGA-based framework that enables rapid exploration and deployment of quantized neural networks, achieving high throughput with reduced precision for embedded and cloud platforms.
Contribution
This paper introduces the second-generation FINN framework, which automates the design of custom low-precision neural network inference engines on FPGAs with formal resource and performance modeling.
Findings
Achieved 50 TOP/s throughput on AWS F1 platform.
Demonstrated high efficiency on embedded devices with 5 TOP/s.
Supported a range of neural networks from CIFAR-10 to YOLO.
Abstract
Convolutional Neural Networks have rapidly become the most successful machine learning algorithm, enabling ubiquitous machine vision and intelligent decisions on even embedded computing-systems. While the underlying arithmetic is structurally simple, compute and memory requirements are challenging. One of the promising opportunities is leveraging reduced-precision representations for inputs, activations and model parameters. The resulting scalability in performance, power efficiency and storage footprint provides interesting design compromises in exchange for a small reduction in accuracy. FPGAs are ideal for exploiting low-precision inference engines leveraging custom precisions to achieve the required numerical accuracy for a given application. In this article, we describe the second generation of the FINN framework, an end-to-end tool which enables design space exploration and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Machine Learning and Data Classification
