Efficient kernel surrogates for neural network-based regression

Saad Qadeer; Andrew Engel; Amanda Howard; Adam Tsou; Max Vargas; Panos; Stinis; and Tony Chiang

arXiv:2310.18612·cs.LG·January 25, 2024·1 cites

Efficient kernel surrogates for neural network-based regression

Saad Qadeer, Andrew Engel, Amanda Howard, Adam Tsou, Max Vargas, Panos, Stinis, and Tony Chiang

PDF

Open Access

TL;DR

This paper investigates the use of efficient kernel surrogates, specifically the Conjugate Kernel (CK), as practical approximations to Neural Tangent Kernels (NTKs) for improving the performance and understanding of neural network-based regression and classification tasks.

Contribution

It provides a theoretical analysis of CK performance relative to NTK, demonstrating near-equivalent results and offering a practical framework for enhancing neural network accuracy inexpensively.

Findings

01

CK performs nearly as well as NTK in regression and classification.

02

The regularity of the kernel significantly influences performance.

03

The approach improves GPT-2 classification and physics-informed operator training.

Abstract

Despite their immense promise in performing a variety of learning tasks, a theoretical understanding of the limitations of Deep Neural Networks (DNNs) has so far eluded practitioners. This is partly due to the inability to determine the closed forms of the learned functions, making it harder to study their generalization properties on unseen datasets. Recent work has shown that randomly initialized DNNs in the infinite width limit converge to kernel machines relying on a Neural Tangent Kernel (NTK) with known closed form. These results suggest, and experimental evidence corroborates, that empirical kernel machines can also act as surrogates for finite width DNNs. The high computational cost of assembling the full NTK, however, makes this approach infeasible in practice, motivating the need for low-cost approximations. In the current work, we study the performance of the Conjugate Kernel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and Data Classification · Neural Networks and Applications

MethodsMulti-Head Attention · Attention Is All You Need · Logistic Regression · Byte Pair Encoding · Dense Connections · Cosine Annealing · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Layer · Residual Connection · Weight Decay