Regularization Matters: Generalization and Optimization of Neural Nets   v.s. their Induced Kernel

Colin Wei; Jason D. Lee; Qiang Liu; Tengyu Ma

arXiv:1810.05369·stat.ML·April 28, 2020·46 cites

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Colin Wei, Jason D. Lee, Qiang Liu, Tengyu Ma

PDF

Open Access

TL;DR

This paper demonstrates that regularization significantly influences the generalization and sample efficiency of neural networks, showing that regularized neural nets can outperform their kernel equivalents in learning efficiency.

Contribution

It introduces new analysis tools for understanding the impact of regularization on neural nets and kernel methods, and proves that regularized neural nets can be globally optimized with polynomial iterations.

Findings

01

Regularized neural nets learn with fewer samples than NTK-based methods.

02

The global minimizer of regularized cross-entropy is the max normalized margin solution.

03

Gradient descent can efficiently find the regularized global minimum in neural nets.

Abstract

Recent works have shown that on sufficiently over-parametrized neural nets, gradient descent with relatively large initialization optimizes a prediction function in the RKHS of the Neural Tangent Kernel (NTK). This analysis leads to global convergence results but does not work when there is a standard $ℓ_{2}$ regularizer, which is useful to have in practice. We show that sample efficiency can indeed depend on the presence of the regularizer: we construct a simple distribution in d dimensions which the optimal regularized neural net learns with $O (d)$ samples but the NTK requires $Ω (d^{2})$ samples to learn. To prove this, we establish two analysis tools: i) for multi-layer feedforward ReLU nets, we show that the global minimizer of a weakly-regularized cross-entropy loss is the max normalized margin solution among all neural nets, which generalizes well; ii) we develop a new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM

MethodsNeural Tangent Kernel · *Communicated@Fast*How Do I Communicate to Expedia?