FPGA Implementations of 3D-SIMD Processor Architecture for Deep Neural   Networks Using Relative Indexed Compressed Sparse Filter Encoding Format and   Stacked Filters Stationary Flow

Yuechao Gao; Nianhong Liu; Sheng Zhang

arXiv:1803.10548·cs.CV·April 13, 2018

FPGA Implementations of 3D-SIMD Processor Architecture for Deep Neural Networks Using Relative Indexed Compressed Sparse Filter Encoding Format and Stacked Filters Stationary Flow

Yuechao Gao, Nianhong Liu, Sheng Zhang

PDF

Open Access

TL;DR

This paper presents FPGA implementations of a novel 3D-SIMD processor architecture optimized for deep neural networks, utilizing a new sparse filter encoding format and dataflow to significantly improve computational efficiency on embedded systems.

Contribution

It introduces FPGA implementations of the SFS dataflow and CSF encoding format, achieving substantial efficiency gains over prior methods in neural network processing.

Findings

01

At least 2x improvement in computation efficiency per PE on most layers.

02

8x improvement on AlexNet layer CONV4 with 384 filters.

03

11x improvement on VGG16 layer CONV5-3 with 512 filters.

Abstract

It is a challenging task to deploy computationally and memory intensive State-of-the-art deep neural networks (DNNs) on embedded systems with limited hardware resources and power budgets. Recently developed techniques like Deep Compression make it possible to fit large DNNs, such as AlexNet and VGGNet, fully in on-chip SRAM. But sparse networks compressed using existing encoding formats, like CSR or CSC, complex the computation at runtime due to their irregular memory access characteristics. In [1], we introduce a computation dataflow, stacked filters stationary dataflow (SFS), and a corresponding data encoding format, relative indexed compressed sparse filter format (CSF), to make the best of data sparsity, and simplify data handling at execution time. In this paper we present FPGA implementations of these methods. We implement several compact streaming fully connected (FC) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Sparse and Compressive Sensing Techniques

Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/