Convergence Analysis of Homotopy-SGD for non-convex optimization

Matilde Gargiani; Andrea Zanelli; Quoc Tran-Dinh; Moritz; Diehl; Frank Hutter

arXiv:2011.10298·cs.LG·November 23, 2020

Convergence Analysis of Homotopy-SGD for non-convex optimization

Matilde Gargiani, Andrea Zanelli, Quoc Tran-Dinh, Moritz, Diehl, Frank Hutter

PDF

Open Access

TL;DR

This paper introduces Homotopy-SGD, a stochastic optimization algorithm that achieves faster convergence rates for non-convex problems by combining homotopy methods with SGD, supported by theoretical analysis and experimental validation.

Contribution

It presents a novel homotopy-based stochastic gradient method with proven linear convergence under mild assumptions, improving upon standard SGD in non-convex optimization.

Findings

01

H-SGD converges linearly to a neighborhood of the minimum.

02

H-SGD outperforms standard SGD in experiments.

03

Theoretical analysis confirms fast convergence rates.

Abstract

First-order stochastic methods for solving large-scale non-convex optimization problems are widely used in many big-data applications, e.g. training deep neural networks as well as other complex and potentially non-convex machine learning models. Their inexpensive iterations generally come together with slow global convergence rate (mostly sublinear), leading to the necessity of carrying out a very high number of iterations before the iterates reach a neighborhood of a minimizer. In this work, we present a first-order stochastic algorithm based on a combination of homotopy methods and SGD, called Homotopy-Stochastic Gradient Descent (H-SGD), which finds interesting connections with some proposed heuristics in the literature, e.g. optimization by Gaussian continuation, training by diffusion, mollifying networks. Under some mild assumptions on the problem structure, we conduct a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM

MethodsStochastic Gradient Descent