Fast Convergence for Stochastic and Distributed Gradient Descent in the   Interpolation Limit

Partha P Mitra

arXiv:1803.02922·stat.ML·May 30, 2018

Fast Convergence for Stochastic and Distributed Gradient Descent in the Interpolation Limit

Partha P Mitra

PDF

TL;DR

This paper introduces a new distributed gradient descent algorithm that converges rapidly in the interpolation limit, providing theoretical insights into the efficiency of SGD in high-dimensional deep learning models.

Contribution

It presents a distributed gradient descent method with linear convergence in the interpolation limit, avoiding the need for infinite penalty parameters for consensus.

Findings

01

Distributed algorithm converges linearly with exponential error reduction.

02

Convergence rate depends on the smallest nonzero eigenvalue of the sample covariance.

03

In the interpolation limit, consensus is achieved without infinite penalty parameters.

Abstract

Modern supervised learning techniques, particularly those using deep nets, involve fitting high dimensional labelled data sets with functions containing very large numbers of parameters. Much of this work is empirical. Interesting phenomena have been observed that require theoretical explanations; however the non-convexity of the loss functions complicates the analysis. Recently it has been proposed that the success of these techniques rests partly in the effectiveness of the simple stochastic gradient descent algorithm in the so called interpolation limit in which all labels are fit perfectly. This analysis is made possible since the SGD algorithm reduces to a stochastic linear system near the interpolating minimum of the loss function. Here we exploit this insight by presenting and analyzing a new distributed algorithm for gradient descent, also in the interpolating limit. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsStochastic Gradient Descent