FPGA Based Implementation of Deep Neural Networks Using On-chip Memory   Only

Jinhwan Park; Wonyong Sung

arXiv:1602.01616·cs.AR·August 30, 2016

FPGA Based Implementation of Deep Neural Networks Using On-chip Memory Only

Jinhwan Park, Wonyong Sung

PDF

TL;DR

This paper presents an FPGA-based fixed-point deep neural network system that uses only on-chip memory, achieving higher efficiency and lower power consumption than GPU implementations for tasks like MNIST digit recognition.

Contribution

The work introduces a novel FPGA implementation of DNNs with on-chip memory only, using 3-bit weights and fixed-point training for improved efficiency and reduced power consumption.

Findings

01

Speed is about 25% of GPU implementation.

02

Power consumption is less than 5 Watts.

03

System outperforms PC-based implementations.

Abstract

Deep neural networks (DNNs) demand a very large amount of computation and weight storage, and thus efficient implementation using special purpose hardware is highly desired. In this work, we have developed an FPGA based fixed-point DNN system using only on-chip memory not to access external DRAM. The execution time and energy consumption of the developed system is compared with a GPU based implementation. Since the capacity of memory in FPGA is limited, only 3-bit weights are used for this implementation, and training based fixed-point weight optimization is employed. The implementation using Xilinx XC7Z045 is tested for the MNIST handwritten digit recognition benchmark and a phoneme recognition task on TIMIT corpus. The obtained speed is about one quarter of a GPU based implementation and much better than that of a PC based one. The power consumption is less than 5 Watt at the full…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings