The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural   Networks: an Exact Characterization of the Optimal Solutions

Yifei Wang; Jonathan Lacotte; Mert Pilanci

arXiv:2006.05900·cs.LG·March 15, 2022

The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural Networks: an Exact Characterization of the Optimal Solutions

Yifei Wang, Jonathan Lacotte, Mert Pilanci

PDF

Open Access

TL;DR

This paper provides a convex optimization framework to exactly characterize all globally optimal solutions for two-layer ReLU neural networks, offering new insights into the neural network training landscape and its global minima.

Contribution

It introduces a novel convex program that characterizes all optimal solutions without duality, enabling exact construction and analysis of the neural network landscape.

Findings

01

All global optima can be found via a convex cone program.

02

Clarke stationary points correspond to global optima of a convex subproblem.

03

Provides polynomial-time checks for global optimality and constructs paths to minima.

Abstract

We prove that finding all globally optimal two-layer ReLU neural networks can be performed by solving a convex optimization program with cone constraints. Our analysis is novel, characterizes all optimal solutions, and does not leverage duality-based analysis which was recently used to lift neural network training into convex spaces. Given the set of solutions of our convex optimization program, we show how to construct exactly the entire set of optimal neural networks. We provide a detailed characterization of this optimal set and its invariant transformations. As additional consequences of our convex perspective, (i) we establish that Clarke stationary points found by stochastic gradient descent correspond to the global optimum of a subsampled convex problem (ii) we provide a polynomial-time algorithm for checking if a neural network is a global minimum of the training loss (iii) we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM

MethodsWeight Decay · *Communicated@Fast*How Do I Communicate to Expedia?