DBS: Dynamic Batch Size For Distributed Deep Neural Network Training

Qing Ye; Yuhao Zhou; Mingjia Shi; Yanan Sun; Jiancheng Lv

arXiv:2007.11831·cs.LG·November 4, 2022·6 cites

DBS: Dynamic Batch Size For Distributed Deep Neural Network Training

Qing Ye, Yuhao Zhou, Mingjia Shi, Yanan Sun, Jiancheng Lv

PDF

Open Access 1 Repo

TL;DR

This paper introduces a Dynamic Batch Size (DBS) strategy for distributed DNN training that adjusts batch sizes based on worker performance, improving cluster utilization and reducing training time.

Contribution

The paper proposes a novel DBS strategy that dynamically adjusts batch sizes in distributed training, enhancing efficiency and robustness over fixed-size approaches.

Findings

01

Reduces training time significantly.

02

Improves cluster utilization.

03

Maintains convergence guarantees.

Abstract

Synchronous strategies with data parallelism, such as the Synchronous StochasticGradient Descent (S-SGD) and the model averaging methods, are widely utilizedin distributed training of Deep Neural Networks (DNNs), largely owing to itseasy implementation yet promising performance. Particularly, each worker ofthe cluster hosts a copy of the DNN and an evenly divided share of the datasetwith the fixed mini-batch size, to keep the training of DNNs convergence. In thestrategies, the workers with different computational capability, need to wait foreach other because of the synchronization and delays in network transmission,which will inevitably result in the high-performance workers wasting computation.Consequently, the utilization of the cluster is relatively low. To alleviate thisissue, we propose the Dynamic Batch Size (DBS) strategy for the distributedtraining of DNNs. Specifically, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Soptq/Dynamic_Batch-Size_DistributedDNN
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM