High Performance Scalable FPGA Accelerator for Deep Neural Networks

Sudarshan Srinivasan; Pradeep Janedula; Saurabh Dhoble; Sasikanth; Avancha; Dipankar Das; Naveen Mellempudi; Bharat Daga; Martin Langhammer,; Gregg Baeckler; Bharat Kaul

arXiv:1908.11809·cs.DC·September 2, 2019

High Performance Scalable FPGA Accelerator for Deep Neural Networks

Sudarshan Srinivasan, Pradeep Janedula, Saurabh Dhoble, Sasikanth, Avancha, Dipankar Das, Naveen Mellempudi, Bharat Daga, Martin Langhammer,, Gregg Baeckler, Bharat Kaul

PDF

TL;DR

This paper presents a high-performance FPGA accelerator for CNN inference using low-precision INT-8-2 compute, achieving performance metrics that surpass CPUs and GPUs and approach ASIC levels, while maintaining FPGA versatility.

Contribution

The work introduces a novel ALM-based FPGA design for INT-8-2 compute, enabling high AI-TOPS performance for CNN inference, a capability not supported by existing ASICs, CPUs, or GPUs.

Findings

01

Achieves 5 AI-TOPS on Arria10 FPGA.

02

Projects 76 AI-TOPS at 0.7 TOPS/W on Stratix10.

03

Surpasses known CPU and GPU performance, approaching ASIC levels.

Abstract

Low-precision is the first order knob for achieving higher Artificial Intelligence Operations (AI-TOPS). However the algorithmic space for sub-8-bit precision compute is diverse, with disruptive changes happening frequently, making FPGAs a natural choice for Deep Neural Network inference, In this work we present an FPGA-based accelerator for CNN inference acceleration. We use {\it INT-8-2} compute (with {\it 8 bit} activation and {2 bit} weights) which is recently showing promise in the literature, and which no known ASIC, CPU or GPU natively supports today. Using a novel Adaptive Logic Module (ALM) based design, as a departure from traditional DSP based designs, we are able to achieve high performance measurement of 5 AI-TOPS for {\it Arria10} and project a performance of 76 AI-TOPS at 0.7 TOPS/W for {\it Stratix10}. This exceeds known CPU, GPU performance and comes close to best known…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.