Stochastic Gradient Descent in the Viewpoint of Graduated Optimization
Da Li, Jingjing Wu, Qingrun Zhang

TL;DR
This paper analyzes stochastic gradient descent (SGD) through the lens of graduated optimization, providing a formal framework, convergence analysis, and demonstrating potential for improved training accuracy in non-convex machine learning problems.
Contribution
It introduces a formal formulation of graduated optimization using nonnegative approximate identity and shows how SGD can effectively solve smoothed problems with convergence guarantees.
Findings
SGD can be applied to smoothed optimization problems.
Graduated optimization can lead to more accurate training results.
Convergence of SGD on smoothed problems is established.
Abstract
Stochastic gradient descent (SGD) method is popular for solving non-convex optimization problems in machine learning. This work investigates SGD from a viewpoint of graduated optimization, which is a widely applied approach for non-convex optimization problems. Instead of the actual optimization problem, a series of smoothed optimization problems that can be achieved in various ways are solved in the graduated optimization approach. In this work, a formal formulation of the graduated optimization is provided based on the nonnegative approximate identity, which generalizes the idea of Gaussian smoothing. Also, an asymptotic convergence result is achieved with the techniques in variational analysis. Then, we show that the traditional SGD method can be applied to solve the smoothed optimization problem. The Monte Carlo integration is used to achieve the gradient in the smoothed problem,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
