
TL;DR
This paper investigates why local optimization methods effectively solve non-convex problems in machine learning, proposing a formal explanation for the empirical success of such methods despite theoretical NP-hardness.
Contribution
It provides a rigorous formalization showing that most local minima in practical machine learning objectives are approximately global minima, explaining the success of local methods.
Findings
Most local minima are approximately global minima in practical problems
Formalization applies to specific machine learning instances
Supports the empirical effectiveness of local optimization methods
Abstract
Non-convex optimization is ubiquitous in modern machine learning. Researchers devise non-convex objective functions and optimize them using off-the-shelf optimizers such as stochastic gradient descent and its variants, which leverage the local geometry and update iteratively. Even though solving non-convex functions is NP-hard in the worst case, the optimization quality in practice is often not an issue -- optimizers are largely believed to find approximate global minima. Researchers hypothesize a unified explanation for this intriguing phenomenon: most of the local minima of the practically-used objectives are approximately global minima. We rigorously formalize it for concrete instances of machine learning problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
