Non-convex Optimization for Machine Learning

Prateek Jain; Purushottam Kar

arXiv:1712.07897·stat.ML·December 22, 2017

Non-convex Optimization for Machine Learning

Prateek Jain, Purushottam Kar

PDF

TL;DR

This paper reviews recent advances in non-convex optimization techniques used in machine learning, highlighting their practical success and the theoretical understanding needed to analyze their convergence and properties.

Contribution

It provides an overview of non-convex optimization methods, bridging the gap between practical heuristics and theoretical analysis, and introduces tools for understanding these algorithms.

Findings

01

Non-convex optimization methods often outperform convex relaxations in practice.

02

Recent theoretical advances help explain the convergence of heuristics like gradient descent.

03

The monograph offers tools for analyzing non-convex algorithms in machine learning.

Abstract

A vast majority of machine learning algorithms train their models and perform inference by solving optimization problems. In order to capture the learning and prediction problems accurately, structural constraints such as sparsity or low rank are frequently imposed or else the objective itself is designed to be a non-convex function. This is especially true of algorithms that operate in high-dimensional spaces or that train non-linear models such as tensor models and deep networks. The freedom to express the learning problem as a non-convex optimization problem gives immense modeling power to the algorithm designer, but often such problems are NP-hard to solve. A popular workaround to this has been to relax non-convex problems to convex ones and use traditional methods to solve the (convex) relaxed optimization problems. However this approach may be lossy and nevertheless presents…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.