A Theory of Generalization in Deep Learning
Elon Litman, Gabe Guo

TL;DR
This paper develops a comprehensive non-asymptotic theory of generalization in deep learning, explaining phenomena like benign overfitting and double descent through neural tangent kernels and signal-noise partitioning.
Contribution
It introduces a novel theoretical framework that accounts for feature learning, generalization, and implicit bias in deep neural networks, validated by practical risk objectives.
Findings
Generalization persists even with evolving neural tangent kernels.
The theory explains phenomena like benign overfitting, double descent, and grokking.
A new population-risk objective improves training efficiency and robustness.
Abstract
We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rapidly; in the vast orthogonal dimensions corresponding to noise, the kernel's near-zero eigenvalues trap residual error in a test-invisible reservoir. Within the signal channel, minibatch SGD ensures that coherent population signal accumulates via fast linear drift, while idiosyncratic memorization is suppressed into a slow, diffusive random walk. We prove generalization survives even when the kernel evolves in operator norm, the full feature-learning regime. This theory naturally explains disparate phenomena in deep learning theory, such as benign overfitting, double descent, implicit bias, and grokking. Lastly, we derive an exact population-risk objective from a single…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
