Avoiding Communication in Logistic Regression

Aditya Devarakonda; James Demmel

arXiv:2011.08281·cs.LG·November 18, 2020

Avoiding Communication in Logistic Regression

Aditya Devarakonda, James Demmel

PDF

TL;DR

This paper introduces a communication-avoiding variant of SGD for logistic regression, reducing communication frequency and achieving significant speedups on high-performance clusters without sacrificing accuracy.

Contribution

It proposes a novel communication-avoiding SGD technique that reorganizes computations to reduce communication, with theoretical bounds and practical speedup results.

Findings

01

Achieves up to 4.97x speedup on high-performance clusters.

02

Maintains convergence behavior and accuracy of standard SGD.

03

Provides theoretical bounds on flops, bandwidth, and latency.

Abstract

Stochastic gradient descent (SGD) is one of the most widely used optimization methods for solving various machine learning problems. SGD solves an optimization problem by iteratively sampling a few data points from the input data, computing gradients for the selected data points, and updating the solution. However, in a parallel setting, SGD requires interprocess communication at every iteration. We introduce a new communication-avoiding technique for solving the logistic regression problem using SGD. This technique re-organizes the SGD computations into a form that communicates every $s$ iterations instead of every iteration, where $s$ is a tuning parameter. We prove theoretical flops, bandwidth, and latency upper bounds for SGD and its new communication-avoiding variant. Furthermore, we show experimental results that illustrate that the new Communication-Avoiding SGD (CA-SGD) method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent · Logistic Regression