DPUV3INT8: A Compiler View to programmable FPGA Inference Engines
Paolo D'Alberto, Jiangsha Ma, Jintao Li, Yiming Hu, Manasa, Bollavaram, Shaoxia Fang

TL;DR
This paper presents a compiler approach for FPGA inference engines that significantly improves performance and efficiency across multiple neural network models, generalizing hand-tuned solutions.
Contribution
It introduces a compiler framework that generalizes hand-optimized FPGA inference techniques, achieving high throughput and efficiency across various models.
Findings
Near 2x throughput for Resnet50_v1 compared to previous FPGA solutions
Compiler achieves about 1.5x better performance than hand-tuned implementations
80+% hardware efficiency across tested models
Abstract
We have a FPGA design, we make it fast, efficient, and tested for a few important examples. Now we must infer a general solution to deploy in the data center. Here, we describe the FPGA DPUV3INT8 design and our compiler effort. The hand-tuned SW-HW solution for Resnet50\_v1 has (close to) 2 times better images per second (throughput) than our best FPGA implementation; the compiler generalizes the hand written techniques achieving about 1.5 times better performance for the same example, the compiler generalizes the optimizations to a model zoo of networks, and it achieves 80+\% HW efficiency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Interconnection Networks and Systems
