Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines
Hao Zhang, Zhiting Hu, Jinliang Wei, Pengtao Xie, Gunhee Kim, Qirong, Ho, Eric Xing

TL;DR
Poseidon is a scalable system architecture that enhances distributed GPU-based deep learning training across multiple machines by reducing communication bottlenecks and improving GPU utilization, achieving significant speedups.
Contribution
It introduces a three-level hybrid architecture, a distributed wait-free backpropagation algorithm, and a structure-aware communication protocol for efficient multi-machine GPU training.
Findings
Achieves up to 4.5x speedup on AlexNet with 8 nodes
Converges to same objectives as single-machine training
Outperforms CPU-based distributed systems on large datasets
Abstract
Deep learning (DL) has achieved notable successes in many machine learning tasks. A number of frameworks have been developed to expedite the process of designing and training deep neural networks (DNNs), such as Caffe, Torch and Theano. Currently they can harness multiple GPUs on a single machine, but are unable to use GPUs that are distributed across multiple machines; as even average-sized DNNs can take days to train on a single GPU with 100s of GBs to TBs of data, distributed GPUs present a prime opportunity for scaling up DL. However, the limited bandwidth available on commodity Ethernet networks presents a bottleneck to distributed GPU training, and prevents its trivial realization. To investigate how to adapt existing frameworks to efficiently support distributed GPUs, we propose Poseidon, a scalable system architecture for distributed inter-machine communication in existing DL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Advanced Image and Video Retrieval Techniques
Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/
