A GPU-Outperforming FPGA Accelerator Architecture for Binary   Convolutional Neural Networks

Yixing Li; Zichuan Liu; Kai Xu; Hao Yu; Fengbo Ren

arXiv:1702.06392·cs.DC·June 9, 2017·5 cites

A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks

Yixing Li, Zichuan Liu, Kai Xu, Hao Yu, Fengbo Ren

PDF

Open Access

TL;DR

This paper presents an FPGA accelerator architecture for binary CNNs that outperforms GPUs in throughput and energy efficiency, especially for small batch sizes, by leveraging massive spatial parallelism and deep pipelining.

Contribution

The paper introduces an optimized FPGA architecture for binary CNNs that achieves higher throughput and energy efficiency than GPUs, with performance insensitive to batch size.

Findings

01

8.3x faster than Titan X GPU for small batch processing

02

75x more energy-efficient than Titan X GPU in small batch scenarios

03

Comparable throughput to GPU for large batch processing

Abstract

FPGA-based hardware accelerators for convolutional neural networks (CNNs) have obtained great attentions due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput than GPU counterparts. In this paper, we demonstrate that FPGA acceleration can be a superior solution in terms of both throughput and energy efficiency when a CNN is trained with binary constraints on weights and activations. Specifically, we propose an optimized FPGA accelerator architecture tailored for bitwise convolution and normalization that features massive spatial parallelism with deep pipelines stages. A key advantage of the FPGA accelerator is that its performance is insensitive to data batch size, while the performance of GPU acceleration varies largely depending on the batch size of the data. Experiment results show that the proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Advanced Memory and Neural Computing

MethodsConvolution