Brief Announcement: On the Limits of Parallelizing Convolutional Neural Networks on GPUs
Behnam Pourghassemi (1), Chenghao Zhang (1), Joo Hwan Lee (2), Aparna, Chandramowlishwaran (1) ((1) University of California, Irvine, (2) Samsung, Semiconductor)

TL;DR
This paper discusses the limitations of current GPU-based training of deep neural networks, especially non-linear architectures, and advocates for exploiting inter-operation parallelism to reduce training time, highlighting challenges and potential solutions.
Contribution
It identifies the constraints in current GPU frameworks and proposes approaches to enable concurrent execution of layers in complex neural networks.
Findings
Current frameworks launch convolutions serially, missing parallelism opportunities.
Non-linear networks like ResNet have higher inter-operation parallelism.
Exploiting this parallelism can significantly reduce training time.
Abstract
GPUs are currently the platform of choice for training neural networks. However, training a deep neural network (DNN) is a time-consuming process even on GPUs because of the massive number of parameters that have to be learned. As a result, accelerating DNN training has been an area of significant research in the last couple of years. While earlier networks such as AlexNet had a linear dependency between layers and operations, state-of-the-art networks such as ResNet, PathNet, and GoogleNet have a non-linear structure that exhibits a higher level of inter-operation parallelism. However, popular deep learning (DL) frameworks such as TensorFlow and PyTorch launch the majority of neural network operations, especially convolutions, serially on GPUs and do not exploit this inter-op parallelism. In this brief announcement, we make a case for the need and potential benefit of exploiting this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks
Methods1x1 Convolution · Softmax · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Batch Normalization · Inception Module · Dropout · Dense Connections · Max Pooling · Global Average Pooling
