Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks

Soojeong Kim; Gyeong-In Yu; Hojin Park; Sungwoo Cho; Eunji Jeong,; Hyeonmin Ha; Sanha Lee; Joo Seong Jeong; Byung-Gon Chun

arXiv:1808.02621·cs.DC·June 11, 2019·6 cites

Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks

Soojeong Kim, Gyeong-In Yu, Hojin Park, Sungwoo Cho, Eunji Jeong,, Hyeonmin Ha, Sanha Lee, Joo Seong Jeong, Byung-Gon Chun

PDF

Open Access 1 Repo

TL;DR

Parallax is a novel framework that enhances distributed deep learning training by leveraging model sparsity, significantly improving scalability and speed for NLP models on multi-GPU systems.

Contribution

It introduces a hybrid architecture combining Parameter Server and AllReduce to optimize data transfer based on sparsity, improving scalability for NLP models.

Findings

01

Achieves up to 2.8x speedup over TensorFlow on NLP models.

02

Achieves up to 6.02x speedup over Horovod on NLP models.

03

Maintains comparable or better performance on dense models.

Abstract

The employment of high-performance servers and GPU accelerators for training deep neural network models have greatly accelerated recent advances in deep learning (DL). DL frameworks, such as TensorFlow, MXNet, and Caffe2, have emerged to assist DL researchers to train their models in a distributed manner. Although current DL frameworks scale well for image classification models, there remain opportunities for scalable distributed training on natural language processing (NLP) models. We found that current frameworks show relatively low scalability on training NLP models due to the lack of consideration to the difference in sparsity of model parameters. In this paper, we propose Parallax, a framework that optimizes data parallel training by utilizing the sparsity of model parameters. Parallax introduces a hybrid approach that combines Parameter Server and AllReduce architectures to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

snuspl/parallax
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Neural Networks and Applications