Convergence Analysis of Homotopy-SGD for non-convex optimization
Matilde Gargiani, Andrea Zanelli, Quoc Tran-Dinh, Moritz, Diehl, Frank Hutter

TL;DR
This paper introduces Homotopy-SGD, a stochastic optimization algorithm that achieves faster convergence rates for non-convex problems by combining homotopy methods with SGD, supported by theoretical analysis and experimental validation.
Contribution
It presents a novel homotopy-based stochastic gradient method with proven linear convergence under mild assumptions, improving upon standard SGD in non-convex optimization.
Findings
H-SGD converges linearly to a neighborhood of the minimum.
H-SGD outperforms standard SGD in experiments.
Theoretical analysis confirms fast convergence rates.
Abstract
First-order stochastic methods for solving large-scale non-convex optimization problems are widely used in many big-data applications, e.g. training deep neural networks as well as other complex and potentially non-convex machine learning models. Their inexpensive iterations generally come together with slow global convergence rate (mostly sublinear), leading to the necessity of carrying out a very high number of iterations before the iterates reach a neighborhood of a minimizer. In this work, we present a first-order stochastic algorithm based on a combination of homotopy methods and SGD, called Homotopy-Stochastic Gradient Descent (H-SGD), which finds interesting connections with some proposed heuristics in the literature, e.g. optimization by Gaussian continuation, training by diffusion, mollifying networks. Under some mild assumptions on the problem structure, we conduct a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
MethodsStochastic Gradient Descent
