Global Optimality Beyond Two Layers: Training Deep ReLU Networks via   Convex Programs

Tolga Ergen; Mert Pilanci

arXiv:2110.05518·cs.LG·January 14, 2022·5 cites

Global Optimality Beyond Two Layers: Training Deep ReLU Networks via Convex Programs

Tolga Ergen, Mert Pilanci

PDF

Open Access 1 Video

TL;DR

This paper presents a convex optimization framework for training deep ReLU networks, revealing a hidden regularization mechanism and enabling global optimality guarantees beyond two layers.

Contribution

It introduces a novel convex reformulation of training deep ReLU networks, providing theoretical guarantees for global optimality and interpreting networks as high-dimensional feature selectors.

Findings

01

Convex reformulation of multi-layer ReLU training

02

Global optimality with polynomial complexity

03

ReLU networks as feature selection methods

Abstract

Understanding the fundamental mechanism behind the success of deep neural networks is one of the key challenges in the modern machine learning literature. Despite numerous attempts, a solid theoretical analysis is yet to be developed. In this paper, we develop a novel unified framework to reveal a hidden regularization mechanism through the lens of convex optimization. We first show that the training of multiple three-layer ReLU sub-networks with weight decay regularization can be equivalently cast as a convex optimization problem in a higher dimensional space, where sparsity is enforced via a group $ℓ_{1}$ -norm regularization. Consequently, ReLU networks can be interpreted as high dimensional feature selection methods. More importantly, we then prove that the equivalent convex problem can be globally optimized by a standard convex optimization solver with a polynomial-time complexity…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Global Optimality Beyond Two Layers: Training Deep ReLU Networks via Convex Programs· slideslive

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Machine Learning and ELM · Face and Expression Recognition

MethodsFeature Selection · Weight Decay