Poseidon: A System Architecture for Efficient GPU-based Deep Learning on   Multiple Machines

Hao Zhang; Zhiting Hu; Jinliang Wei; Pengtao Xie; Gunhee Kim; Qirong; Ho; Eric Xing

arXiv:1512.06216·cs.LG·December 22, 2015·43 cites

Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines

Hao Zhang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Gunhee Kim, Qirong, Ho, Eric Xing

PDF

Open Access

TL;DR

Poseidon is a scalable system architecture that enhances distributed GPU-based deep learning training across multiple machines by reducing communication bottlenecks and improving GPU utilization, achieving significant speedups.

Contribution

It introduces a three-level hybrid architecture, a distributed wait-free backpropagation algorithm, and a structure-aware communication protocol for efficient multi-machine GPU training.

Findings

01

Achieves up to 4.5x speedup on AlexNet with 8 nodes

02

Converges to same objectives as single-machine training

03

Outperforms CPU-based distributed systems on large datasets

Abstract

Deep learning (DL) has achieved notable successes in many machine learning tasks. A number of frameworks have been developed to expedite the process of designing and training deep neural networks (DNNs), such as Caffe, Torch and Theano. Currently they can harness multiple GPUs on a single machine, but are unable to use GPUs that are distributed across multiple machines; as even average-sized DNNs can take days to train on a single GPU with 100s of GBs to TBs of data, distributed GPUs present a prime opportunity for scaling up DL. However, the limited bandwidth available on commodity Ethernet networks presents a bottleneck to distributed GPU training, and prevents its trivial realization. To investigate how to adapt existing frameworks to efficiently support distributed GPUs, we propose Poseidon, a scalable system architecture for distributed inter-machine communication in existing DL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Advanced Image and Video Retrieval Techniques

Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/