Penalising the biases in norm regularisation enforces sparsity
Etienne Boursier, Nicolas Flammarion

TL;DR
This paper investigates how penalising bias terms in norm regularisation influences the sparsity and uniqueness of neural network estimators, revealing that bias regularisation enforces sparsity by affecting the total variation of the second derivative.
Contribution
It provides a theoretical analysis linking bias regularisation to sparsity and uniqueness of minimal norm solutions in one-hidden-layer ReLU networks.
Findings
Bias regularisation introduces a weighting factor that enforces sparsity.
Omitting bias regularisation allows for non-sparse solutions.
Regularising biases leads to minimal norm interpolators with fewer kinks.
Abstract
Controlling the parameters' norm often yields good generalisation when training neural networks. Beyond simple intuitions, the relation between regularising parameters' norm and obtained estimators remains theoretically misunderstood. For one hidden ReLU layer networks with unidimensional data, this work shows the parameters' norm required to represent a function is given by the total variation of its second derivative, weighted by a factor. Notably, this weighting factor disappears when the norm of bias terms is not regularised. The presence of this additional weighting factor is of utmost significance as it is shown to enforce the uniqueness and sparsity (in the number of kinks) of the minimal norm interpolator. Conversely, omitting the bias' norm allows for non-sparse solutions. Penalising the bias terms in the regularisation, either explicitly or implicitly, thus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Stochastic Gradient Optimization Techniques
