Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data
Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro, Wei Hu

TL;DR
This paper investigates how gradient-based training of two-layer leaky ReLU neural networks on high-dimensional, nearly-orthogonal data inherently biases the networks toward low-rank solutions with max-margin properties, both theoretically and empirically.
Contribution
It characterizes the implicit bias of gradient flow and gradient descent in producing low-rank, max-margin solutions in high-dimensional settings, revealing the importance of initialization scale.
Findings
Gradient flow yields rank at most two solutions with max-margin properties.
A single gradient descent step with small initialization reduces network rank significantly.
Small initialization scale is crucial for low-rank solution emergence.
Abstract
The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning. In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that asymptotically, gradient flow produces a neural network with rank at most two. Moreover, this network is an -max-margin solution (in parameter space), and has a linear decision boundary that corresponds to an approximate-max-margin linear predictor. For gradient descent, provided the random initialization variance is small enough, we show that a single step of gradient descent suffices to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques
MethodsHuMan(Expedia)||How do I get a human at Expedia?
