Training NTK to Generalize with KARE

Johannes Schwab; Bryan Kelly; Semyon Malamud; Teng Andrea Xu

arXiv:2505.11347·cs.LG·May 22, 2025

Training NTK to Generalize with KARE

Johannes Schwab, Bryan Kelly, Semyon Malamud, Teng Andrea Xu

PDF

Open Access 2 Repos

TL;DR

This paper introduces a method to explicitly optimize the neural tangent kernel (NTK) using Kernel Alignment Risk Estimator (KARE), leading to improved generalization performance over traditional neural network training.

Contribution

It proposes explicitly training the NTK with KARE to outperform standard DNNs and NTKs derived from trained networks, challenging conventional end-to-end training dominance.

Findings

01

NTKs trained with KARE outperform original DNNs.

02

Explicit NTK training can surpass traditional end-to-end DNN optimization.

03

Results demonstrate the potential of explicit kernel training for better generalization.

Abstract

The performance of the data-dependent neural tangent kernel (NTK; Jacot et al. (2018)) associated with a trained deep neural network (DNN) often matches or exceeds that of the full network. This implies that DNN training via gradient descent implicitly performs kernel learning by optimizing the NTK. In this paper, we propose instead to optimize the NTK explicitly. Rather than minimizing empirical risk, we train the NTK to minimize its generalization error using the recently developed Kernel Alignment Risk Estimator (KARE; Jacot et al. (2020)). Our simulations and real data experiments show that NTKs trained with KARE consistently match or significantly outperform the original DNN and the DNN- induced NTK (the after-kernel). These results suggest that explicitly trained kernels can outperform traditional end-to-end DNN optimization in certain settings, challenging the conventional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning · Face recognition and analysis

MethodsNeural Tangent Kernel