Differentiable Kernel Ridge Regression for Deep Learning Pipelines
Jean-Marc Mercier, Gabriele Santin

TL;DR
This paper introduces differentiable, scalable kernel ridge regression modules called Sparse Kernels that can be integrated into deep learning pipelines, enabling new transfer, probing, and hybrid modeling capabilities.
Contribution
The authors present Sparse Kernels, a differentiable, local, and lazy kernel ridge regression variant that can be integrated into deep learning models as modular layers with trainable or fixed parameters.
Findings
SK modules match neural readouts with less training
SK modules improve performance when added to existing models
Kernel methods can be integrated with deep learning effectively
Abstract
Deep neural networks dominate modern machine learning, while alternative function approximators remain comparatively underexplored at scale. In this work, we revisit kernel methods as drop-in components for standard deep learning pipelines. We introduce \emph{Sparse Kernels} (SKs), a differentiable, localized, and lazy variant of kernel ridge regression (KRR) that defers training to inference time and reduces to the solution of small local systems. We integrate SKs into PyTorch as modular layers that preserve end-to-end trainability, and we show that they expose three distinct sets of parameters -- feature representations, target values, and evaluation points -- each of which can be fixed or learned. This decomposition broadens the design space available to practitioners, enabling, in particular, training-free transfer, nonlinear probing, and hybrid kernel-neural models. Across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
