On the Limitation of Kernel Dependence Maximization for Feature Selection
Keli Liu, Feng Ruan

TL;DR
This paper critically examines the use of HSIC-based dependence maximization for feature selection, revealing its limitations and potential to overlook important features despite its intuitive appeal.
Contribution
The paper demonstrates through counterexamples that HSIC-based feature selection can fail to identify critical features, challenging the assumption that dependence maximization always leads to optimal feature subsets.
Findings
HSIC maximization can miss important features
Counterexamples show limitations of dependence-based selection
Dependence measures may not guarantee optimal feature subsets
Abstract
A simple and intuitive method for feature selection consists of choosing the feature subset that maximizes a nonparametric measure of dependence between the response and the features. A popular proposal from the literature uses the Hilbert-Schmidt Independence Criterion (HSIC) as the nonparametric dependence measure. The rationale behind this approach to feature selection is that important features will exhibit a high dependence with the response and their inclusion in the set of selected features will increase the HSIC. Through counterexamples, we demonstrate that this rationale is flawed and that feature selection via HSIC maximization can miss critical features.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Machine Learning and Data Classification · Neural Networks and Applications
MethodsSparse Evolutionary Training · Feature Selection
