DBS: Dynamic Batch Size For Distributed Deep Neural Network Training
Qing Ye, Yuhao Zhou, Mingjia Shi, Yanan Sun, Jiancheng Lv

TL;DR
This paper introduces a Dynamic Batch Size (DBS) strategy for distributed DNN training that adjusts batch sizes based on worker performance, improving cluster utilization and reducing training time.
Contribution
The paper proposes a novel DBS strategy that dynamically adjusts batch sizes in distributed training, enhancing efficiency and robustness over fixed-size approaches.
Findings
Reduces training time significantly.
Improves cluster utilization.
Maintains convergence guarantees.
Abstract
Synchronous strategies with data parallelism, such as the Synchronous StochasticGradient Descent (S-SGD) and the model averaging methods, are widely utilizedin distributed training of Deep Neural Networks (DNNs), largely owing to itseasy implementation yet promising performance. Particularly, each worker ofthe cluster hosts a copy of the DNN and an evenly divided share of the datasetwith the fixed mini-batch size, to keep the training of DNNs convergence. In thestrategies, the workers with different computational capability, need to wait foreach other because of the synchronization and delays in network transmission,which will inevitably result in the high-performance workers wasting computation.Consequently, the utilization of the cluster is relatively low. To alleviate thisissue, we propose the Dynamic Batch Size (DBS) strategy for the distributedtraining of DNNs. Specifically, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM
