Weak error analysis for stochastic gradient descent optimization   algorithms

Aritz Bercher; Lukas Gonon; Arnulf Jentzen; Diyora Salimova

arXiv:2007.02723·math.NA·July 22, 2020

Weak error analysis for stochastic gradient descent optimization algorithms

Aritz Bercher, Lukas Gonon, Arnulf Jentzen, Diyora Salimova

PDF

Open Access

TL;DR

This paper analyzes the weak error decay in stochastic gradient descent algorithms, showing that under certain conditions, the error with respect to test functions diminishes at the same rate as the error in the objective function.

Contribution

It provides a theoretical framework for understanding the decay of weak errors in SGD, extending convergence analysis to test functions different from the objective.

Findings

01

Weak error decays at the same rate as strong error under certain assumptions.

02

The analysis applies to various machine learning applications including NLP and image recognition.

03

Main result establishes decay rate equivalence for different test functions.

Abstract

Stochastic gradient descent (SGD) type optimization schemes are fundamental ingredients in a large number of machine learning based algorithms. In particular, SGD type optimization schemes are frequently employed in applications involving natural language processing, object and face recognition, fraud detection, computational advertisement, and numerical approximations of partial differential equations. In mathematical convergence results for SGD type optimization schemes there are usually two types of error criteria studied in the scientific literature, that is, the error in the strong sense and the error with respect to the objective function. In applications one is often not only interested in the size of the error with respect to the objective function but also in the size of the error with respect to a test function which is possibly different from the objective function. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Statistical Methods and Inference · Sparse and Compressive Sensing Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent