Towards Understanding Learning in Neural Networks with Linear Teachers

Roei Sarussi; Alon Brutzkus; Amir Globerson

arXiv:2101.02533·cs.LG·July 29, 2021

Towards Understanding Learning in Neural Networks with Linear Teachers

Roei Sarussi, Alon Brutzkus, Amir Globerson

PDF

Open Access 1 Video

TL;DR

This paper proves that stochastic gradient descent can globally optimize a two-layer neural network with Leaky ReLU activations to learn linearly separable data, and explains why the resulting network often behaves approximately linearly.

Contribution

It provides the first theoretical proof of global optimization for this setting and links weight clustering to linear decision boundaries.

Findings

01

SGD globally optimizes the learning problem.

02

Networks often become approximately linear.

03

Weight clustering implies linear decision boundaries.

Abstract

Can a neural network minimizing cross-entropy learn linearly separable data? Despite progress in the theory of deep learning, this question remains unsolved. Here we prove that SGD globally optimizes this learning problem for a two-layer network with Leaky ReLU activations. The learned network can in principle be very complex. However, empirical evidence suggests that it often turns out to be approximately linear. We provide theoretical support for this phenomenon by proving that if network weights converge to two weight clusters, this will imply an approximately linear decision boundary. Finally, we show a condition on the optimization that leads to weight clustering. We provide empirical results that validate our theoretical analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Towards Understanding Learning in Neural Networks with Linear Teachers· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning

Methods*Communicated@Fast*How Do I Communicate to Expedia? · HuMan(Expedia)||How do I get a human at Expedia? · Stochastic Gradient Descent