Implicit Regularization in Deep Learning

Behnam Neyshabur

arXiv:1709.01953·cs.LG·September 11, 2017·77 cites

Implicit Regularization in Deep Learning

Behnam Neyshabur

PDF

Open Access 1 Repo

TL;DR

This paper explores how implicit regularization from optimization algorithms influences generalization in deep learning, analyzing complexity measures and invariances to better understand neural network success.

Contribution

It demonstrates the role of implicit regularization in deep learning and investigates complexity measures and invariances that explain generalization phenomena.

Findings

01

Implicit regularization significantly impacts generalization.

02

Certain complexity measures can predict neural network performance.

03

Invariances in neural networks relate to specific optimization algorithms.

Abstract

In an attempt to better understand generalization in deep learning, we study several possible explanations. We show that implicit regularization induced by the optimization method is playing a key role in generalization and success of deep learning models. Motivated by this view, we study how different complexity measures can ensure generalization and explain how optimization algorithms can implicitly regularize complexity measures. We empirically investigate the ability of these measures to explain different observed phenomena in deep learning. We further study the invariances in neural networks, suggest complexity measures and optimization algorithms that have similar invariances to those in neural networks and evaluate them on a number of learning tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bneyshabur/generalization-bounds
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Neural Networks and Applications · Stochastic Gradient Optimization Techniques