Optimality of the final model found via Stochastic Gradient Descent
Andrea Schioppa

TL;DR
This paper investigates the convergence properties of Stochastic Gradient Descent (SGD) for convex functions without smoothness or strict convexity assumptions, focusing on the optimality of the final model obtained.
Contribution
It provides theoretical guarantees that the final SGD model's objective value is close to the minimum, extending previous online learning results to the final model.
Findings
Final SGD model's objective is close to the minimum with high probability.
No assumptions on smoothness or strict convexity are needed.
Comparison with online learning techniques of [Zin03].
Abstract
We study convergence properties of Stochastic Gradient Descent (SGD) for convex objectives without assumptions on smoothness or strict convexity. We consider the question of establishing that with high probability the objective evaluated at the candidate minimizer returned by SGD is close to the minimal value of the objective. We compare this result concerning the final candidate minimzer (i.e. the final model parameters learned after all gradient steps) to the online learning techniques of [Zin03] that take a rolling average of the model parameters at the different steps of SGD.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
MethodsStochastic Gradient Descent
