FPGA-Based CNN Inference Accelerator Synthesized from Multi-Threaded C   Software

Jin Hee Kim; Brett Grady; Ruolong Lian; John Brothers; Jason H.; Anderson

arXiv:1807.10695·cs.LG·July 30, 2018

FPGA-Based CNN Inference Accelerator Synthesized from Multi-Threaded C Software

Jin Hee Kim, Brett Grady, Ruolong Lian, John Brothers, Jason H., Anderson

PDF

TL;DR

This paper presents an FPGA-based CNN inference accelerator synthesized from multi-threaded C software, leveraging high-level synthesis to convert software parallelism into hardware, achieving high performance on VGG-16.

Contribution

It introduces a novel approach to synthesize CNN inference accelerators from multi-threaded C code using high-level synthesis, including zero-weight-skipping and reduced precision techniques.

Findings

01

Peak performance of 138 effective GOPS on VGG-16

02

Successful synthesis of convolution, pooling, and padding in FPGA hardware

03

System combines FPGA accelerator with embedded ARM processor

Abstract

A deep-learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads. The software implementation uses the well-known producer/consumer model with parallel threads interconnected by FIFO queues. The LegUp high-level synthesis (HLS) tool synthesizes threads into parallel FPGA hardware, translating software parallelism into spatial parallelism. A complete system is generated where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. The accelerator incorporates reduced precision, and a novel approach for zero-weight-skipping in convolution. On a mid-sized Intel Arria 10 SoC FPGA, peak performance on VGG-16 is 138 effective GOPS.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.