On false discovery rate thresholding for classification under sparsity
Pierre Neuvial (SG), Etienne Roquain (LPMA)

TL;DR
This paper analyzes the theoretical properties of FDR thresholding for classification under sparsity, providing explicit convergence rates and an adaptive choice of the FDR level based on the data size.
Contribution
It offers nonasymptotic oracle inequalities for FDR thresholding, demonstrating its adaptivity to unknown sparsity and deriving explicit thresholds for optimal performance.
Findings
FDR thresholding achieves convergence of excess risk as data size increases.
Explicit FDR level choices improve classification performance in sparse regimes.
Theoretical results are supported by numerical experiments.
Abstract
We study the properties of false discovery rate (FDR) thresholding, viewed as a classification procedure. The "0"-class (null) is assumed to have a known density while the "1"-class (alternative) is obtained from the "0"-class either by translation or by scaling. Furthermore, the "1"-class is assumed to have a small number of elements w.r.t. the "0"-class (sparsity). We focus on densities of the Subbotin family, including Gaussian and Laplace models. Nonasymptotic oracle inequalities are derived for the excess risk of FDR thresholding. These inequalities lead to explicit rates of convergence of the excess risk to zero, as the number m of items to be classified tends to infinity and in a regime where the power of the Bayes rule is away from 0 and 1. Moreover, these theoretical investigations suggest an explicit choice for the target level of FDR thresholding, as a function of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
