Canonical Regularisation of Wide Feature-Learning Neural Networks
George Whittle, Pranav Vaidhyanathan, Juliusz Ziomek, Natalia Ares, Maike A. Osborne

TL;DR
This paper investigates the regularisation properties of wide neural networks in the feature-learning regime, revealing biases introduced by ridge regularisation and proposing a unified framework that generalises kernel and feature-learning regimes.
Contribution
It introduces a regime-agnostic function-space energy framework that characterises canonical regularisation, extends ridge concepts to feature-learning networks, and proposes practical surrogate methods.
Findings
Ridge regularisation biases feature-learning networks even with vanishing regularisation.
The canonical prior is a Riemannian Gibbs Process, generalising Gaussian Processes.
Arc ridge is proposed as a scalable surrogate, linking early stopping to regularisation.
Abstract
Wide neural networks in the feature-learning regime drive modern deep learning, and yet they remain far less studied than their kernel-regime counterparts. We consider a critical yet under-explored difference between these two regimes: the regulariser and prior implied by gradient flow training. This canonical regularisation property is well-studied in kernel regime networks -- of all the infinite global minima, gradient flow selects exactly the vanishing ridge solution -- and underpins the celebrated NN-GP correspondence, precisely allowing the modelling of noise during training. However, we prove ridge regularisation biases gradient flow in feature-learning regime networks, even in the infinitesimal limit of vanishing regularisation. Over training, ridge distorts the inductive bias of the network, with a particular damage done to pretrained networks where the implicit prior is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
