Fast Global Convergence via Landscape of Empirical Loss

Chao Qu; Yan Li; Huan Xu

arXiv:1802.04617·stat.ML·February 14, 2018

Fast Global Convergence via Landscape of Empirical Loss

Chao Qu, Yan Li, Huan Xu

PDF

Open Access

TL;DR

This paper demonstrates that stochastic variance reduction methods can efficiently find the global optimum for certain non-convex loss functions in machine learning, leveraging the landscape of empirical loss.

Contribution

It proves linear convergence of stochastic variance reduction methods for non-convex M-estimators by exploiting the statistical properties of the population loss.

Findings

01

Stochastic variance reduction methods achieve global optimality with linear convergence.

02

Improved convergence analysis for batch gradient methods.

03

Insights into the landscape of empirical loss for non-convex optimization.

Abstract

While optimizing convex objective (loss) functions has been a powerhouse for machine learning for at least two decades, non-convex loss functions have attracted fast growing interests recently, due to many desirable properties such as superior robustness and classification accuracy, compared with their convex counterparts. The main obstacle for non-convex estimators is that it is in general intractable to find the optimal solution. In this paper, we study the computational issues for some non-convex M-estimators. In particular, we show that the stochastic variance reduction methods converge to the global optimal with linear rate, by exploiting the statistical property of the population loss. En route, we improve the convergence analysis for the batch gradient method in \cite{mei2016landscape}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms