Brief Announcement: On the Limits of Parallelizing Convolutional Neural   Networks on GPUs

Behnam Pourghassemi (1); Chenghao Zhang (1); Joo Hwan Lee (2); Aparna; Chandramowlishwaran (1) ((1) University of California; Irvine; (2) Samsung; Semiconductor)

arXiv:2005.13823·cs.DC·May 29, 2020

Brief Announcement: On the Limits of Parallelizing Convolutional Neural Networks on GPUs

Behnam Pourghassemi (1), Chenghao Zhang (1), Joo Hwan Lee (2), Aparna, Chandramowlishwaran (1) ((1) University of California, Irvine, (2) Samsung, Semiconductor)

PDF

Open Access

TL;DR

This paper discusses the limitations of current GPU-based training of deep neural networks, especially non-linear architectures, and advocates for exploiting inter-operation parallelism to reduce training time, highlighting challenges and potential solutions.

Contribution

It identifies the constraints in current GPU frameworks and proposes approaches to enable concurrent execution of layers in complex neural networks.

Findings

01

Current frameworks launch convolutions serially, missing parallelism opportunities.

02

Non-linear networks like ResNet have higher inter-operation parallelism.

03

Exploiting this parallelism can significantly reduce training time.

Abstract

GPUs are currently the platform of choice for training neural networks. However, training a deep neural network (DNN) is a time-consuming process even on GPUs because of the massive number of parameters that have to be learned. As a result, accelerating DNN training has been an area of significant research in the last couple of years. While earlier networks such as AlexNet had a linear dependency between layers and operations, state-of-the-art networks such as ResNet, PathNet, and GoogleNet have a non-linear structure that exhibits a higher level of inter-operation parallelism. However, popular deep learning (DL) frameworks such as TensorFlow and PyTorch launch the majority of neural network operations, especially convolutions, serially on GPUs and do not exploit this inter-op parallelism. In this brief announcement, we make a case for the need and potential benefit of exploiting this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks

Methods1x1 Convolution · Softmax · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Batch Normalization · Inception Module · Dropout · Dense Connections · Max Pooling · Global Average Pooling