Interpretable Kernels
Patrick J.F. Groenen, Michael Greenacre

TL;DR
This paper presents a method to interpret kernel-based models in terms of original features, enabling better understanding of nonlinear predictions in machine learning.
Contribution
It introduces a way to re-express kernel solutions as linear combinations of original features, enhancing interpretability in high-dimensional settings.
Findings
Kernel solutions can be expressed as linear combinations of original features.
Interpretability is maintained even with high feature-to-observation ratios.
Applicable to various kernel-based models like logistic and Poisson regression.
Abstract
The use of kernels for nonlinear prediction is widespread in machine learning. They have been popularized in support vector machines and used in kernel ridge regression, amongst others. Kernel methods share three aspects. First, instead of the original matrix of predictor variables or features, each observation is mapped into an enlarged feature space. Second, a ridge penalty term is used to shrink the coefficients on the features in the enlarged feature space. Third, the solution is not obtained in this enlarged feature space, but through solving a dual problem in the observation space. A major drawback in the present use of kernels is that the interpretation in terms of the original features is lost. In this paper, we argue that in the case of a wide matrix of features, where there are more features than observations, the kernel solution can be re-expressed in terms of a linear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
