Gradient Descent in Neural Networks as Sequential Learning in RKBS

Alistair Shilton; Sunil Gupta; Santu Rana; Svetha Venkatesh

arXiv:2302.00205·stat.ML·February 2, 2023

Gradient Descent in Neural Networks as Sequential Learning in RKBS

Alistair Shilton, Sunil Gupta, Santu Rana, Svetha Venkatesh

PDF

Open Access

TL;DR

This paper develops a new theoretical framework for neural network training by representing gradient descent as sequential learning in reproducing kernel Banach spaces, extending beyond the over-parametrized regime.

Contribution

It introduces an exact power-series representation of neural networks in RKBS, enabling analysis of training dynamics beyond the NTK approximation.

Findings

01

Gradient descent training can be exactly modeled as sequential learning in RKBS.

02

Provides new bounds on uniform convergence related to iteration count and learning rate.

03

Extends theoretical understanding of neural network training beyond wide networks.

Abstract

The study of Neural Tangent Kernels (NTKs) has provided much needed insight into convergence and generalization properties of neural networks in the over-parametrized (wide) limit by approximating the network using a first-order Taylor expansion with respect to its weights in the neighborhood of their initialization values. This allows neural network training to be analyzed from the perspective of reproducing kernel Hilbert spaces (RKHS), which is informative in the over-parametrized regime, but a poor approximation for narrower networks as the weights change more during training. Our goal is to extend beyond the limits of NTK toward a more general theory. We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights as an inner product of two feature maps, respectively from data and weight-step space, to feature space, allowing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Neural Networks and Applications · Stochastic Gradient Optimization Techniques

MethodsNeural Tangent Kernel