TL;DR
FINN is a flexible FPGA framework that efficiently maps binarized neural networks, achieving unprecedented classification speeds with low power consumption on standard embedded platforms.
Contribution
The paper introduces FINN, a novel FPGA-based framework for fast, scalable binarized neural network inference with customizable compute resources per layer.
Findings
Achieves up to 12.3 million classifications per second on MNIST
Demonstrates 21,906 classifications per second on CIFAR-10 and SVHN
Reports the fastest classification rates to date on these benchmarks
Abstract
Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a framework for building fast and flexible FPGA accelerators using a flexible heterogeneous streaming architecture. By utilizing a novel set of optimizations that enable efficient mapping of binarized neural networks to hardware, we implement fully connected, convolutional and pooling layers, with per-layer compute resources being tailored to user-provided throughput requirements. On a ZC706 embedded FPGA platform drawing less than 25 W total system power, we demonstrate up to 12.3 million image classifications per second with 0.31 {\mu}s latency on the MNIST dataset with 95.8% accuracy, and 21906 image classifications per second with 283…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
