Disentangling the Mechanisms Behind Implicit Regularization in SGD

Zachary Novack; Simran Kaur; Tanya Marwah; Saurabh Garg; Zachary C.; Lipton

arXiv:2211.15853·cs.LG·November 30, 2022

Disentangling the Mechanisms Behind Implicit Regularization in SGD

Zachary Novack, Simran Kaur, Tanya Marwah, Saurabh Garg, Zachary C., Lipton

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper empirically investigates the mechanisms behind why small-batch SGD generalizes better than large-batch SGD, focusing on the role of implicit regularization and how different regularizers affect generalization across datasets.

Contribution

It provides the first extensive empirical evaluation of various hypotheses on implicit regularization in SGD, highlighting the effectiveness of gradient norm and Fisher information regularizations.

Findings

01

Explicit regularization of gradient norm and Fisher trace recovers small-batch generalization.

02

Jacobian regularizations do not replicate small-batch benefits.

03

Regularization effects vary across datasets like CIFAR10 and CIFAR100.

Abstract

A number of competing hypotheses have been proposed to explain why small-batch Stochastic Gradient Descent (SGD)leads to improved generalization over the full-batch regime, with recent work crediting the implicit regularization of various quantities throughout training. However, to date, empirical evidence assessing the explanatory power of these hypotheses is lacking. In this paper, we conduct an extensive empirical evaluation, focusing on the ability of various theorized mechanisms to close the small-to-large batch generalization gap. Additionally, we characterize how the quantities that SGD has been claimed to (implicitly) regularize change over the course of training. By using micro-batches, i.e. disjoint smaller subsets of each mini-batch, we empirically show that explicitly penalizing the gradient norm or the Fisher Information Matrix trace, averaged over micro-batches, in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zacharynovack/imp-regularizers-arxiv
pytorchOfficial

Videos

Disentangling the Mechanisms Behind Implicit Regularization in SGD· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques

Methodsfail · Test · Stochastic Gradient Descent