Implicit Regularization and Generalization in Overparameterized Neural Networks
Zeran Johannsen

TL;DR
This paper explores how optimization dynamics and implicit regularization enable overparameterized neural networks to generalize well, challenging classical theory predictions.
Contribution
It provides controlled experiments analyzing the effects of batch size, landscape geometry, NTK regime, double descent, and lottery tickets on generalization.
Findings
Smaller batch sizes lead to flatter minima and lower test error.
Sparse subnetworks can match full model performance with only 10% of parameters.
Flatter minima correlate with better generalization and lower Hessian eigenvalues.
Abstract
Classical statistical learning theory predicts that overparameterized models should exhibit severe overfitting, yet modern deep neural networks with far more parameters than training samples consistently generalize well. This contradiction has become a central theoretical question in machine learning. This study investigates the role of optimization dynamics and implicit regularization in enabling generalization in overparameterized neural networks through controlled experiments. We examine stochastic gradient descent (SGD) across batch sizes, the geometry of flat versus sharp minima via Hessian eigenvalue estimation and weight perturbation analysis, the Neural Tangent Kernel (NTK) regime through wide-network experiments, double descent across model scales, and the Lottery Ticket Hypothesis through iterative magnitude pruning. All experiments use PyTorch on CIFAR-10 and MNIST with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
