A Feature Selection Method that Controls the False Discovery Rate
Mehdi Rostami, Olli Saarela

TL;DR
This paper introduces a distribution-free feature selection method called Data Splitting Selection (DSS) that controls the false discovery rate without distributional assumptions, offering high power and theoretical guarantees.
Contribution
The paper proposes a novel, assumption-free feature selection method (DSS) that controls FDR and improves power compared to existing techniques.
Findings
DSS effectively controls FDR in simulations.
A higher power version of DSS nearly controls FDR.
Extensive simulations demonstrate superior performance over existing methods.
Abstract
The problem of selecting a handful of truly relevant variables in supervised machine learning algorithms is a challenging problem in terms of untestable assumptions that must hold and unavailability of theoretical assurances that selection errors are under control. We propose a distribution-free feature selection method, referred to as Data Splitting Selection (DSS) which controls False Discovery Rate (FDR) of feature selection while obtaining a high power. Another version of DSS is proposed with a higher power which "almost" controls FDR. No assumptions are made on the distribution of the response or on the joint distribution of the features. Extensive simulation is performed to compare the performance of the proposed methods with the existing ones.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Statistical Methods and Inference
