Adaptive Consensus Gradients Aggregation for Scaled Distributed Training

Yoni Choukroun; Shlomi Azoulay; Pavel Kisilev

arXiv:2411.03742·cs.LG·November 7, 2024

Adaptive Consensus Gradients Aggregation for Scaled Distributed Training

Yoni Choukroun, Shlomi Azoulay, Pavel Kisilev

PDF

Open Access 1 Repo

TL;DR

This paper proposes an adaptive gradient aggregation method for distributed deep learning that improves convergence and performance by optimizing gradient weighting and introducing subspace momentum, outperforming simple averaging.

Contribution

It introduces a novel subspace optimization framework for gradient aggregation, including an adaptive weighting scheme and subspace momentum, enhancing efficiency and accuracy in distributed training.

Findings

01

Outperforms standard gradient averaging on MLPerf benchmarks

02

Reduces communication and computational costs

03

Accelerates convergence with subspace momentum

Abstract

Distributed machine learning has recently become a critical paradigm for training large models on vast datasets. We examine the stochastic optimization problem for deep learning within synchronous parallel computing environments under communication constraints. While averaging distributed gradients is the most widely used method for gradient estimation, whether this is the optimal strategy remains an open question. In this work, we analyze the distributed gradient aggregation process through the lens of subspace optimization. By formulating the aggregation problem as an objective-aware subspace optimization problem, we derive an efficient weighting scheme for gradients, guided by subspace coefficients. We further introduce subspace momentum to accelerate convergence while maintaining statistical unbiasedness in the aggregation. Our method demonstrates improved performance over the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yonilc/adacons
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques