Global Optimality Beyond Two Layers: Training Deep ReLU Networks via Convex Programs
Tolga Ergen, Mert Pilanci

TL;DR
This paper presents a convex optimization framework for training deep ReLU networks, revealing a hidden regularization mechanism and enabling global optimality guarantees beyond two layers.
Contribution
It introduces a novel convex reformulation of training deep ReLU networks, providing theoretical guarantees for global optimality and interpreting networks as high-dimensional feature selectors.
Findings
Convex reformulation of multi-layer ReLU training
Global optimality with polynomial complexity
ReLU networks as feature selection methods
Abstract
Understanding the fundamental mechanism behind the success of deep neural networks is one of the key challenges in the modern machine learning literature. Despite numerous attempts, a solid theoretical analysis is yet to be developed. In this paper, we develop a novel unified framework to reveal a hidden regularization mechanism through the lens of convex optimization. We first show that the training of multiple three-layer ReLU sub-networks with weight decay regularization can be equivalently cast as a convex optimization problem in a higher dimensional space, where sparsity is enforced via a group -norm regularization. Consequently, ReLU networks can be interpreted as high dimensional feature selection methods. More importantly, we then prove that the equivalent convex problem can be globally optimized by a standard convex optimization solver with a polynomial-time complexity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Machine Learning and ELM · Face and Expression Recognition
MethodsFeature Selection · Weight Decay
