DHO$_2$: Accelerating Distributed Hybrid Order Optimization via Model Parallelism and ADMM
Shunxian Gu, Chaoqun You, Bangbang Ren, Lailong Luo, Junxu Xia, Deke Guo

TL;DR
This paper introduces DHO$_2$, a distributed hybrid order optimizer that accelerates deep neural network training by combining model parallelism with ADMM, reducing memory use and training time in resource-constrained environments.
Contribution
It proposes a novel distributed design for FOSI, enabling parallelized curvature computation and model updates, significantly improving training speed and efficiency.
Findings
Achieves near-linear memory reduction with more devices.
Provides 1.4x to 2.1x speedup over traditional distributed optimizers.
Effectively accelerates DNN training in resource-limited settings.
Abstract
Scaling deep neural network (DNN) training to more devices can reduce time-to-solution. However, it is impractical for users with limited computing resources. FOSI, as a hybrid order optimizer, converges faster than conventional optimizers by taking advantage of both gradient information and curvature information when updating the DNN model. Therefore, it provides a new chance for accelerating DNN training in the resource-constrained setting. In this paper, we explore its distributed design, namely DHO, including distributed calculation of curvature information and model update with partial curvature information to accelerate DNN training with a low memory burden. To further reduce the training time, we design a novel strategy to parallelize the calculation of curvature information and the model update on different devices. Experimentally, our distributed design can achieve an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Neural Networks and Reservoir Computing
