Interpretable Kernels

Patrick J.F. Groenen; Michael Greenacre

arXiv:2508.15932·stat.ML·August 25, 2025

Interpretable Kernels

Patrick J.F. Groenen, Michael Greenacre

PDF

TL;DR

This paper presents a method to interpret kernel-based models in terms of original features, enabling better understanding of nonlinear predictions in machine learning.

Contribution

It introduces a way to re-express kernel solutions as linear combinations of original features, enhancing interpretability in high-dimensional settings.

Findings

01

Kernel solutions can be expressed as linear combinations of original features.

02

Interpretability is maintained even with high feature-to-observation ratios.

03

Applicable to various kernel-based models like logistic and Poisson regression.

Abstract

The use of kernels for nonlinear prediction is widespread in machine learning. They have been popularized in support vector machines and used in kernel ridge regression, amongst others. Kernel methods share three aspects. First, instead of the original matrix of predictor variables or features, each observation is mapped into an enlarged feature space. Second, a ridge penalty term is used to shrink the coefficients on the features in the enlarged feature space. Third, the solution is not obtained in this enlarged feature space, but through solving a dual problem in the observation space. A major drawback in the present use of kernels is that the interpretation in terms of the original features is lost. In this paper, we argue that in the case of a wide matrix of features, where there are more features than observations, the kernel solution can be re-expressed in terms of a linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.