Robust Large Margin Deep Neural Networks

Jure Sokolic; Raja Giryes; Guillermo Sapiro; Miguel R. D. Rodrigues

arXiv:1605.08254·stat.ML·July 4, 2017

Robust Large Margin Deep Neural Networks

Jure Sokolic, Raja Giryes, Guillermo Sapiro, Miguel R. D. Rodrigues

PDF

TL;DR

This paper analyzes the generalization ability of deep neural networks through their Jacobian matrix, proposing that controlling its spectral norm improves generalization across various architectures and is supported by experiments on multiple datasets.

Contribution

It introduces a Jacobian-based analysis for deep neural network generalization, providing new bounds and a regularizer that enhance understanding and performance.

Findings

01

Bounded Jacobian spectral norm correlates with better generalization.

02

Batch and weight normalization improve generalization properties.

03

Experimental validation on MNIST, CIFAR-10, LaRED, and ImageNet datasets.

Abstract

The generalization error of deep neural networks via their classification margin is studied in this work. Our approach is based on the Jacobian matrix of a deep neural network and can be applied to networks with arbitrary non-linearities and pooling layers, and to networks with different architectures such as feed forward networks and residual networks. Our analysis leads to the conclusion that a bounded spectral norm of the network's Jacobian matrix in the neighbourhood of the training samples is crucial for a deep neural network of arbitrary depth and width to generalize well. This is a significant improvement over the current bounds in the literature, which imply that the generalization error grows with either the width or the depth of the network. Moreover, it shows that the recently proposed batch normalization and weight normalization re-parametrizations enjoy good generalization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsBatch Normalization