Single Stream Parallelization of Recurrent Neural Networks for Low Power   and Fast Inference

Wonyong Sung; Jinhwan Park

arXiv:1803.11389·cs.DC·April 2, 2018·5 cites

Single Stream Parallelization of Recurrent Neural Networks for Low Power and Fast Inference

Wonyong Sung, Jinhwan Park

PDF

Open Access

TL;DR

This paper proposes a parallelization method for single stream RNNs that executes multiple time steps simultaneously, significantly reducing DRAM accesses and power consumption, while achieving substantial speed-ups on ARM systems.

Contribution

It introduces a novel parallelization approach for RNN inference that improves speed and energy efficiency by executing multiple time steps concurrently.

Findings

01

300% speed-up with 4 time steps

02

930% speed-up with 16 time steps

03

Reduced DRAM accesses and power consumption

Abstract

As neural network algorithms show high performance in many applications, their efficient inference on mobile and embedded systems are of great interests. When a single stream recurrent neural network (RNN) is executed for a personal user in embedded systems, it demands a large amount of DRAM accesses because the network size is usually much bigger than the cache size and the weights of an RNN are used only once at each time step. We overcome this problem by parallelizing the algorithm and executing it multiple time steps at a time. This approach also reduces the power consumption by lowering the number of DRAM accesses. QRNN (Quasi Recurrent Neural Networks) and SRU (Simple Recurrent Unit) based recurrent neural networks are used for implementation. The experiments for SRU showed about 300% and 930% of speed-up when the numbers of multi time steps are 4 and 16, respectively, in an ARM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Advanced Neural Network Applications · Parallel Computing and Optimization Techniques

MethodsHighway Layer · SRU · Convolution · Sigmoid Activation · Tanh Activation · Masked Convolution · Quasi-Recurrent Neural Network