Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks
Soojeong Kim, Gyeong-In Yu, Hojin Park, Sungwoo Cho, Eunji Jeong,, Hyeonmin Ha, Sanha Lee, Joo Seong Jeong, Byung-Gon Chun

TL;DR
Parallax is a novel framework that enhances distributed deep learning training by leveraging model sparsity, significantly improving scalability and speed for NLP models on multi-GPU systems.
Contribution
It introduces a hybrid architecture combining Parameter Server and AllReduce to optimize data transfer based on sparsity, improving scalability for NLP models.
Findings
Achieves up to 2.8x speedup over TensorFlow on NLP models.
Achieves up to 6.02x speedup over Horovod on NLP models.
Maintains comparable or better performance on dense models.
Abstract
The employment of high-performance servers and GPU accelerators for training deep neural network models have greatly accelerated recent advances in deep learning (DL). DL frameworks, such as TensorFlow, MXNet, and Caffe2, have emerged to assist DL researchers to train their models in a distributed manner. Although current DL frameworks scale well for image classification models, there remain opportunities for scalable distributed training on natural language processing (NLP) models. We found that current frameworks show relatively low scalability on training NLP models due to the lack of consideration to the difference in sparsity of model parameters. In this paper, we propose Parallax, a framework that optimizes data parallel training by utilizing the sparsity of model parameters. Parallax introduces a hybrid approach that combines Parameter Server and AllReduce architectures to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Neural Networks and Applications
