Optimality of the final model found via Stochastic Gradient Descent

Andrea Schioppa

arXiv:1810.09418·cs.LG·October 23, 2018

Optimality of the final model found via Stochastic Gradient Descent

Andrea Schioppa

PDF

Open Access

TL;DR

This paper investigates the convergence properties of Stochastic Gradient Descent (SGD) for convex functions without smoothness or strict convexity assumptions, focusing on the optimality of the final model obtained.

Contribution

It provides theoretical guarantees that the final SGD model's objective value is close to the minimum, extending previous online learning results to the final model.

Findings

01

Final SGD model's objective is close to the minimum with high probability.

02

No assumptions on smoothness or strict convexity are needed.

03

Comparison with online learning techniques of [Zin03].

Abstract

We study convergence properties of Stochastic Gradient Descent (SGD) for convex objectives without assumptions on smoothness or strict convexity. We consider the question of establishing that with high probability the objective evaluated at the candidate minimizer returned by SGD is close to the minimal value of the objective. We compare this result concerning the final candidate minimzer (i.e. the final model parameters learned after all gradient steps) to the online learning techniques of [Zin03] that take a rolling average of the model parameters at the different steps of SGD.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques

MethodsStochastic Gradient Descent