Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

Spencer Frei; Gal Vardi; Peter L. Bartlett; Nathan Srebro; Wei Hu

arXiv:2210.07082·cs.LG·October 14, 2022·5 cites

Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro, Wei Hu

PDF

Open Access 1 Video

TL;DR

This paper investigates how gradient-based training of two-layer leaky ReLU neural networks on high-dimensional, nearly-orthogonal data inherently biases the networks toward low-rank solutions with max-margin properties, both theoretically and empirically.

Contribution

It characterizes the implicit bias of gradient flow and gradient descent in producing low-rank, max-margin solutions in high-dimensional settings, revealing the importance of initialization scale.

Findings

01

Gradient flow yields rank at most two solutions with max-margin properties.

02

A single gradient descent step with small initialization reduces network rank significantly.

03

Small initialization scale is crucial for low-rank solution emergence.

Abstract

The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning. In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that asymptotically, gradient flow produces a neural network with rank at most two. Moreover, this network is an $ℓ_{2}$ -max-margin solution (in parameter space), and has a linear decision boundary that corresponds to an approximate-max-margin linear predictor. For gradient descent, provided the random initialization variance is small enough, we show that a single step of gradient descent suffices to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques

MethodsHuMan(Expedia)||How do I get a human at Expedia?