Canonical Regularisation of Wide Feature-Learning Neural Networks

George Whittle; Pranav Vaidhyanathan; Juliusz Ziomek; Natalia Ares; Maike A. Osborne

arXiv:2605.18180·stat.ML·May 19, 2026

Canonical Regularisation of Wide Feature-Learning Neural Networks

George Whittle, Pranav Vaidhyanathan, Juliusz Ziomek, Natalia Ares, Maike A. Osborne

PDF

TL;DR

This paper investigates the regularisation properties of wide neural networks in the feature-learning regime, revealing biases introduced by ridge regularisation and proposing a unified framework that generalises kernel and feature-learning regimes.

Contribution

It introduces a regime-agnostic function-space energy framework that characterises canonical regularisation, extends ridge concepts to feature-learning networks, and proposes practical surrogate methods.

Findings

01

Ridge regularisation biases feature-learning networks even with vanishing regularisation.

02

The canonical prior is a Riemannian Gibbs Process, generalising Gaussian Processes.

03

Arc ridge is proposed as a scalable surrogate, linking early stopping to regularisation.

Abstract

Wide neural networks in the feature-learning regime drive modern deep learning, and yet they remain far less studied than their kernel-regime counterparts. We consider a critical yet under-explored difference between these two regimes: the regulariser and prior implied by gradient flow training. This canonical regularisation property is well-studied in kernel regime networks -- of all the infinite global minima, gradient flow selects exactly the vanishing ridge solution -- and underpins the celebrated NN-GP correspondence, precisely allowing the modelling of noise during training. However, we prove ridge regularisation biases gradient flow in feature-learning regime networks, even in the infinitesimal limit of vanishing regularisation. Over training, ridge distorts the inductive bias of the network, with a particular damage done to pretrained networks where the implicit prior is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.