DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration
Mohamed S. Abdelfattah, David Han, Andrew Bitar, Roberto DiCecco,, Shane OConnell, Nitika Shanker, Joseph Chu, Ian Prins, Joshua Fender, Andrew, C. Ling, Gordon R. Chiu

TL;DR
This paper presents a specialized FPGA overlay and a domain-specific compiler for neural network inference, achieving high performance with minimal overhead and enabling fast development cycles for deep learning applications.
Contribution
It introduces a low-overhead, application-specific FPGA overlay and a graph compiler that optimizes deep learning models for efficient inference.
Findings
Achieves ~1% overhead for overlay control and reprogramming.
Provides architecture-driven optimizations boosting CNN/RNN performance.
Reaches ~900 fps on GoogLeNet, the fastest reported on similar FPGAs.
Abstract
Overlays have shown significant promise for field-programmable gate-arrays (FPGAs) as they allow for fast development cycles and remove many of the challenges of the traditional FPGA hardware design flow. However, this often comes with a significant performance burden resulting in very little adoption of overlays for practical applications. In this paper, we tailor an overlay to a specific application domain, and we show how we maintain its full programmability without paying for the performance overhead traditionally associated with overlays. Specifically, we introduce an overlay targeted for deep neural network inference with only ~1% overhead to support the control and reprogramming logic using a lightweight very-long instruction word (VLIW) network. Additionally, we implement a sophisticated domain specific graph compiler that compiles deep learning languages such as Caffe or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
