DPUV3INT8: A Compiler View to programmable FPGA Inference Engines

Paolo D'Alberto; Jiangsha Ma; Jintao Li; Yiming Hu; Manasa; Bollavaram; Shaoxia Fang

arXiv:2110.04327·cs.CL·October 12, 2021·1 cites

DPUV3INT8: A Compiler View to programmable FPGA Inference Engines

Paolo D'Alberto, Jiangsha Ma, Jintao Li, Yiming Hu, Manasa, Bollavaram, Shaoxia Fang

PDF

Open Access

TL;DR

This paper presents a compiler approach for FPGA inference engines that significantly improves performance and efficiency across multiple neural network models, generalizing hand-tuned solutions.

Contribution

It introduces a compiler framework that generalizes hand-optimized FPGA inference techniques, achieving high throughput and efficiency across various models.

Findings

01

Near 2x throughput for Resnet50_v1 compared to previous FPGA solutions

02

Compiler achieves about 1.5x better performance than hand-tuned implementations

03

80+% hardware efficiency across tested models

Abstract

We have a FPGA design, we make it fast, efficient, and tested for a few important examples. Now we must infer a general solution to deploy in the data center. Here, we describe the FPGA DPUV3INT8 design and our compiler effort. The hand-tuned SW-HW solution for Resnet50\_v1 has (close to) 2 times better images per second (throughput) than our best FPGA implementation; the compiler generalizes the hand written techniques achieving about 1.5 times better performance for the same example, the compiler generalizes the optimizations to a model zoo of networks, and it achieves 80+\% HW efficiency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Interconnection Networks and Systems