Bias of Stochastic Gradient Descent or the Architecture: Disentangling   the Effects of Overparameterization of Neural Networks

Amit Peleg; Matthias Hein

arXiv:2407.03848·cs.LG·February 4, 2025

Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks

Amit Peleg, Matthias Hein

PDF

Open Access

TL;DR

This paper investigates how overparameterization affects neural network generalization, distinguishing the roles of SGD bias and architectural bias through experiments on width and depth variations.

Contribution

It disentangles the effects of optimization bias and architectural bias on generalization in overparameterized neural networks.

Findings

01

Width overparameterization benefits generalization due to SGD bias.

02

Depth overparameterization harms generalization, linked to architectural bias.

03

SGD bias influences generalization in width, but architecture dominates in depth.

Abstract

Neural networks typically generalize well when fitting the data perfectly, even though they are heavily overparameterized. Many factors have been pointed out as the reason for this phenomenon, including an implicit bias of stochastic gradient descent (SGD) and a possible simplicity bias arising from the neural network architecture. The goal of this paper is to disentangle the factors that influence generalization stemming from optimization and architectural choices by studying random and SGD-optimized networks that achieve zero training error. We experimentally show, in the low sample regime, that overparameterization in terms of increasing width is beneficial for generalization, and this benefit is due to the bias of SGD and not due to an architectural bias. In contrast, for increasing depth, overparameterization is detrimental for generalization, but random and SGD-optimized networks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques

MethodsStochastic Gradient Descent