Experiments on Parallel Training of Deep Neural Network using Model Averaging
Hang Su, Haoyu Chen

TL;DR
This paper explores parallel training of deep neural networks using model averaging, demonstrating significant speedups on large datasets with minimal accuracy loss by leveraging multiple GPUs and MPI communication.
Contribution
It introduces a model averaging approach for parallel DNN training, integrating NG-SGD and RBM pretraining to enhance efficiency and scalability.
Findings
Achieved 9.3x speedup with 16 GPUs on Switchboard dataset.
Achieved 17x speedup with 32 GPUs with minimal accuracy loss.
Validated effectiveness of NG-SGD and RBM pretraining in parallel training.
Abstract
In this work we apply model averaging to parallel training of deep neural network (DNN). Parallelization is done in a model averaging manner. Data is partitioned and distributed to different nodes for local model updates, and model averaging across nodes is done every few minibatches. We use multiple GPUs for data parallelization, and Message Passing Interface (MPI) for communication between nodes, which allows us to perform model averaging frequently without losing much time on communication. We investigate the effectiveness of Natural Gradient Stochastic Gradient Descent (NG-SGD) and Restricted Boltzmann Machine (RBM) pretraining for parallel training in model-averaging framework, and explore the best setups in term of different learning rate schedules, averaging frequencies and minibatch sizes. It is shown that NG-SGD and RBM pretraining benefits parameter-averaging based model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Image Processing and 3D Reconstruction · Face and Expression Recognition
