Generalization by design: Shortcuts to Generalization in Deep Learning
Petr Taborsky, Lars Kai Hansen

TL;DR
This paper introduces a geometric regularizer based on spectral products over layers that enhances generalization in deep learning models, supported by theory and verified through experiments on various architectures and datasets.
Contribution
It proposes a novel geometric regularizer and structural regularizers that encode generalization into network architecture, challenging the notion of implicit bias in vanilla SGD.
Findings
Regularizers improve generalization and accuracy.
Theoretical backing confirms the effectiveness of the geometric regularizer.
Experimental results on multiple architectures and datasets support the approach.
Abstract
We take a geometrical viewpoint and present a unifying view on supervised deep learning with the Bregman divergence loss function - this entails frequent classification and prediction tasks. Motivated by simulations we suggest that there is principally no implicit bias of vanilla stochastic gradient descent training of deep models towards "simpler" functions. Instead, we show that good generalization may be instigated by bounded spectral products over layers leading to a novel geometric regularizer. It is revealed that in deep enough models such a regularizer enables both, extreme accuracy and generalization, to be reached. We associate popular regularization techniques like weight decay, drop out, batch normalization, and early stopping with this perspective. Backed up by theory we further demonstrate that "generalization by design" is practically possible and that good generalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsEarly Stopping
