Understanding the role of depth in the neural tangent kernel for overparameterized neural networks

William St-Arnaud; Margarida Carvalho; Golnoosh Farnadi

arXiv:2511.07272·cs.LG·November 11, 2025

Understanding the role of depth in the neural tangent kernel for overparameterized neural networks

William St-Arnaud, Margarida Carvalho, Golnoosh Farnadi

PDF

Open Access

TL;DR

This paper investigates how increasing depth affects the neural tangent kernel in overparameterized neural networks, revealing convergence behaviors that influence generalization and model performance.

Contribution

It provides a theoretical analysis of the limiting kernel behavior with depth in ReLU networks, highlighting convergence to a trivial kernel and implications for generalization.

Findings

01

Normalized limiting kernel approaches the matrix of ones

02

Closed-form solutions converge to a fixed limit on the sphere

03

Depth influences the kernel's properties and generalization ability

Abstract

Overparameterized fully-connected neural networks have been shown to behave like kernel models when trained with gradient descent, under mild conditions on the width, the learning rate, and the parameter initialization. In the limit of infinitely large widths and small learning rate, the kernel that is obtained allows to represent the output of the learned model with a closed-form solution. This closed-form solution hinges on the invertibility of the limiting kernel, a property that often holds on real-world datasets. In this work, we analyze the sensitivity of large ReLU networks to increasing depths by characterizing the corresponding limiting kernel. Our theoretical results demonstrate that the normalized limiting kernel approaches the matrix of ones. In contrast, they show the corresponding closed-form solution approaches a fixed limit on the sphere. We empirically evaluate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning