The SKIM-FA Kernel: High-Dimensional Variable Selection and Nonlinear Interaction Discovery in Linear Time
Raj Agrawal, Tamara Broderick

TL;DR
This paper introduces the SKIM-FA Kernel, enabling high-dimensional variable selection and nonlinear interaction discovery in linear time, overcoming computational challenges of existing methods and improving performance on large datasets.
Contribution
The paper presents a kernel-based approach that achieves linear-time variable selection and interaction detection in high-dimensional settings, a significant improvement over prior quadratic or worse methods.
Findings
Outperforms existing methods on synthetic and real datasets.
Achieves orders of magnitude faster runtime.
Effectively captures nonlinear interactions and sparsity.
Abstract
Many scientific problems require identifying a small set of covariates that are associated with a target response and estimating their effects. Often, these effects are nonlinear and include interactions, so linear and additive methods can lead to poor estimation and variable selection. Unfortunately, methods that simultaneously express sparsity, nonlinearity, and interactions are computationally intractable -- with runtime at least quadratic in the number of covariates, and often worse. In the present work, we solve this computational bottleneck. We show that suitable interaction models have a kernel representation, namely there exists a "kernel trick" to perform variable selection and estimation in (# covariates) time. Our resulting fit corresponds to a sparse orthogonal decomposition of the regression function in a Hilbert space (i.e., a functional ANOVA decomposition), where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning and Data Classification · Fault Detection and Control Systems
MethodsGreedy Policy Search
