Optimizing Performance of Recurrent Neural Networks on GPUs

Jeremy Appleyard; Tomas Kocisky; Phil Blunsom

arXiv:1604.01946·cs.LG·April 8, 2016·79 cites

Optimizing Performance of Recurrent Neural Networks on GPUs

Jeremy Appleyard, Tomas Kocisky, Phil Blunsom

PDF

Open Access 1 Repo

TL;DR

This paper presents a series of optimization techniques for recurrent neural networks on GPUs, achieving significant speedups by exposing parallelism at multiple levels within the network.

Contribution

The paper introduces a three-stage optimization process integrated into NVIDIA's cuDNN to significantly improve RNN training performance on GPUs.

Findings

01

Achieved an order of magnitude speedup over naive implementations.

02

Optimizations include cell, layer, and network-level parallelism.

03

Implemented optimizations in NVIDIA's cuDNN library.

Abstract

As recurrent neural networks become larger and deeper, training times for single networks are rising into weeks or even months. As such there is a significant incentive to improve the performance and scalability of these networks. While GPUs have become the hardware of choice for training and deploying recurrent models, the implementations employed often make use of only basic optimizations for these architectures. In this article we demonstrate that by exposing parallelism between operations within the network, an order of magnitude speedup across a range of network sizes can be achieved over a naive implementation. We describe three stages of optimization that have been incorporated into the fifth release of NVIDIA's cuDNN: firstly optimizing a single cell, secondly a single layer, and thirdly the entire network.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

parallel-forall/code-samples
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning