How Neural Networks Learn the Support is an Implicit Regularization   Effect of SGD

Pierfrancesco Beneventano; Andrea Pinto; Tomaso Poggio

arXiv:2406.11110·cs.LG·June 18, 2024

How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD

Pierfrancesco Beneventano, Andrea Pinto, Tomaso Poggio

PDF

Open Access

TL;DR

This paper demonstrates that mini-batch SGD implicitly regularizes neural networks to learn the support of the target function by shrinking irrelevant weights, unlike vanilla GD which needs explicit regularization.

Contribution

It reveals a second-order implicit regularization effect of mini-batch SGD that enhances feature interpretability and reduces initialization dependence.

Findings

01

Mini-batch SGD learns support by shrinking irrelevant weights.

02

Vanilla GD requires explicit regularization to learn support.

03

Smaller batch sizes improve feature interpretability.

Abstract

We investigate the ability of deep neural networks to identify the support of the target function. Our findings reveal that mini-batch SGD effectively learns the support in the first layer of the network by shrinking to zero the weights associated with irrelevant components of input. In contrast, we demonstrate that while vanilla GD also approximates the target function, it requires an explicit regularization term to learn the support in the first layer. We prove that this property of mini-batch SGD is due to a second-order implicit regularization effect which is proportional to $η / b$ (step size / batch size). Our results are not only another proof that implicit regularization has a significant impact on training optimization dynamics but they also shed light on the structure of the features that are learned by the network. Additionally, they suggest that smaller batches enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging and Analysis

MethodsStochastic Gradient Descent