Analysis of gradient descent methods with non-diminishing, bounded   errors

Arunselvan Ramaswamy; Shalabh Bhatnagar

arXiv:1604.00151·cs.SY·September 19, 2017

Analysis of gradient descent methods with non-diminishing, bounded errors

Arunselvan Ramaswamy, Shalabh Bhatnagar

PDF

TL;DR

This paper analyzes gradient descent algorithms with persistent, bounded errors, providing conditions for stability and convergence to a neighborhood of the minimum, extending previous results and applicable to machine learning scenarios.

Contribution

It offers a novel, more general analysis of GD with non-vanishing errors, including a practical implementation using SP SA without restrictive step-size conditions.

Findings

01

GD with bounded errors remains stable and converges to a neighborhood of the minimum

02

The analysis extends previous work and applies to constant step-sizes in machine learning

03

Experimental results validate the theoretical findings

Abstract

The main aim of this paper is to provide an analysis of gradient descent (GD) algorithms with gradient errors that do not necessarily vanish, asymptotically. In particular, sufficient conditions are presented for both stability (almost sure boundedness of the iterates) and convergence of GD with bounded, (possibly) non-diminishing gradient errors. In addition to ensuring stability, such an algorithm is shown to converge to a small neighborhood of the minimum set, which depends on the gradient errors. It is worth noting that the main result of this paper can be used to show that GD with asymptotically vanishing errors indeed converges to the minimum set. The results presented herein are not only more general when compared to previous results, but our analysis of GD with errors is new to the literature to the best of our knowledge. Our work extends the contributions of Mangasarian &…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.