A Feature Selection Method that Controls the False Discovery Rate

Mehdi Rostami; Olli Saarela

arXiv:2208.02948·stat.ME·November 10, 2023

A Feature Selection Method that Controls the False Discovery Rate

Mehdi Rostami, Olli Saarela

PDF

Open Access

TL;DR

This paper introduces a distribution-free feature selection method called Data Splitting Selection (DSS) that controls the false discovery rate without distributional assumptions, offering high power and theoretical guarantees.

Contribution

The paper proposes a novel, assumption-free feature selection method (DSS) that controls FDR and improves power compared to existing techniques.

Findings

01

DSS effectively controls FDR in simulations.

02

A higher power version of DSS nearly controls FDR.

03

Extensive simulations demonstrate superior performance over existing methods.

Abstract

The problem of selecting a handful of truly relevant variables in supervised machine learning algorithms is a challenging problem in terms of untestable assumptions that must hold and unavailability of theoretical assurances that selection errors are under control. We propose a distribution-free feature selection method, referred to as Data Splitting Selection (DSS) which controls False Discovery Rate (FDR) of feature selection while obtaining a high power. Another version of DSS is proposed with a higher power which "almost" controls FDR. No assumptions are made on the distribution of the response or on the joint distribution of the features. Extensive simulation is performed to compare the performance of the proposed methods with the existing ones.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Statistical Methods and Inference