Communication-Efficient, 2D Parallel Stochastic Gradient Descent for   Distributed-Memory Optimization

Aditya Devarakonda; Ramakrishnan Kannan

arXiv:2501.07526·cs.DC·January 14, 2025

Communication-Efficient, 2D Parallel Stochastic Gradient Descent for Distributed-Memory Optimization

Aditya Devarakonda, Ramakrishnan Kannan

PDF

TL;DR

This paper introduces HybridSGD, a 2D parallel stochastic gradient descent method that reduces communication costs and improves scalability in distributed-memory systems, outperforming existing algorithms in convergence and speed.

Contribution

It generalizes prior 1D SGD methods to a 2D approach, providing a theoretical framework and empirical evidence of improved performance and scalability.

Findings

01

HybridSGD converges faster than FedAvg at similar scales.

02

Achieves up to 5.3x speedup over s-step SGD.

03

Achieves up to 121x speedup over FedAvg.

Abstract

Distributed-memory implementations of numerical optimization algorithm, such as stochastic gradient descent (SGD), require interprocessor communication at every iteration of the algorithm. On modern distributed-memory clusters where communication is more expensive than computation, the scalability and performance of these algorithms are limited by communication cost. This work generalizes prior work on 1D $s$ -step SGD and 1D Federated SGD with Averaging (FedAvg) to yield a 2D parallel SGD method (HybridSGD) which attains a continuous performance trade off between the two baseline algorithms. We present theoretical analysis which show the convergence, computation, communication, and memory trade offs between $s$ -step SGD, FedAvg, 2D parallel SGD, and other parallel SGD variants. We implement all algorithms in C++ and MPI and evaluate their performance on a Cray EX supercomputing system.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent · Logistic Regression