Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
Hao Zhang, Zeyu Zheng, Shizhen Xu, Wei Dai, Qirong Ho, Xiaodan Liang,, Zhiting Hu, Jinliang Wei, Pengtao Xie, Eric P. Xing

TL;DR
Poseidon is a novel communication architecture that significantly accelerates distributed deep learning on GPU clusters by overlapping communication with computation and optimizing synchronization, leading to substantial speed-ups.
Contribution
Poseidon introduces a layered, hybrid communication scheme that reduces synchronization overhead and is compatible with multiple DL frameworks, improving distributed training efficiency.
Findings
Achieves 15.5x speed-up on 16 GPUs with Caffe and TensorFlow.
Attains 31.5x speed-up on 32 GPUs with TensorFlow on Inception-V3.
Reduces network communication burstiness and synchronization costs.
Abstract
Deep learning models can take weeks to train on a single GPU-equipped machine, necessitating scaling out DL training to a GPU-cluster. However, current distributed DL implementations can scale poorly due to substantial parameter synchronization over the network, because the high throughput of GPUs allows more data batches to be processed per unit time than CPUs, leading to more frequent network synchronization. We present Poseidon, an efficient communication architecture for distributed DL on GPUs. Poseidon exploits the layered model structures in DL programs to overlap communication and computation, reducing bursty network communication. Moreover, Poseidon uses a hybrid communication scheme that optimizes the number of bytes required to synchronize each layer, according to layer properties and the number of machines. We show that Poseidon is applicable to different DL frameworks by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Ferroelectric and Negative Capacitance Devices
