Function Norms and Regularization in Deep Networks
Amal Rannen Triki, Maxim Berman, Matthew B. Blaschko

TL;DR
This paper introduces sampling-based methods to approximate function norms for deep neural network regularization, providing theoretical insights and empirical validation that improve over traditional techniques like weight decay and dropout.
Contribution
It proposes the first practical approach to regularize DNNs using function norms, including theoretical proofs of NP-hardness and a generalization bound, with empirical validation on real-world tasks.
Findings
Regularization with function norms improves model performance.
The proposed methods outperform weight decay, dropout, and batch normalization.
Sampling-based approximation is effective despite NP-hardness of exact computation.
Abstract
Deep neural networks (DNNs) have become increasingly important due to their excellent empirical performance on a wide range of problems. However, regularization is generally achieved by indirect means, largely due to the complex set of functions defined by a network and the difficulty in measuring function complexity. There exists no method in the literature for additive regularization based on a norm of the function, as is classically considered in statistical learning theory. In this work, we propose sampling-based approximations to weighted function norms as regularizers for deep neural networks. We provide, to the best of our knowledge, the first proof in the literature of the NP-hardness of computing function norms of DNNs, motivating the necessity of an approximate approach. We then derive a generalization bound for functions trained with weighted norms and prove that a natural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Face and Expression Recognition · Stochastic Gradient Optimization Techniques
