Why Do Local Methods Solve Nonconvex Problems?

Tengyu Ma

arXiv:2103.13462·cs.LG·March 26, 2021·ASE

Why Do Local Methods Solve Nonconvex Problems?

Tengyu Ma

PDF

TL;DR

This paper investigates why local optimization methods effectively solve non-convex problems in machine learning, proposing a formal explanation for the empirical success of such methods despite theoretical NP-hardness.

Contribution

It provides a rigorous formalization showing that most local minima in practical machine learning objectives are approximately global minima, explaining the success of local methods.

Findings

01

Most local minima are approximately global minima in practical problems

02

Formalization applies to specific machine learning instances

03

Supports the empirical effectiveness of local optimization methods

Abstract

Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.