GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network   Training

Thomas Paine; Hailin Jin; Jianchao Yang; Zhe Lin; Thomas Huang

arXiv:1312.6186·cs.CV·December 24, 2013·ICLR·67 cites

GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training

Thomas Paine, Hailin Jin, Jianchao Yang, Zhe Lin, Thomas Huang

PDF

Open Access

TL;DR

This paper introduces GPU A-SGD, a system combining model and data parallelism to accelerate training of large neural networks, enabling faster training times and larger models for computer vision tasks.

Contribution

The paper presents GPU A-SGD, a novel system that integrates model and data parallelism to improve training speed for large neural networks.

Findings

01

GPU A-SGD significantly speeds up neural network training.

02

It enables training larger models on bigger datasets.

03

The system shows promising early experimental results.

Abstract

The ability to train large-scale neural networks has resulted in state-of-the-art performance in many areas of computer vision. These results have largely come from computational break throughs of two forms: model parallelism, e.g. GPU accelerated training, which has seen quick adoption in computer vision circles, and data parallelism, e.g. A-SGD, whose large scale has been used mostly in industry. We report early experiments with a system that makes use of both model parallelism and data parallelism, we call GPU A-SGD. We show using GPU A-SGD it is possible to speed up training of large convolutional neural networks useful for computer vision. We believe GPU A-SGD will make it possible to train larger networks on larger training sets in a reasonable amount of time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings