Identifying important predictors in large data bases -- multiple testing and model selection
Malgorzata Bogdan, Florian Frommlet

TL;DR
This paper reviews and compares various model selection methods in high-dimensional data, focusing on controlling the false discovery rate and including modifications of information criteria and penalized likelihood approaches like SLOPE and SLOBE.
Contribution
It introduces and evaluates modifications of information criteria suitable for p > n scenarios and compares their performance with penalized likelihood methods in high-dimensional settings.
Findings
Methods effectively control FDR in model selection.
Penalized likelihood methods outperform traditional criteria in high-dimensional data.
Simulation results demonstrate varying performance depending on data conditions.
Abstract
This is a chapter of the forthcoming Handbook of Multiple Testing. We consider a variety of model selection strategies in a high-dimensional setting, where the number of potential predictors p is large compared to the number of available observations n. In particular modifications of information criteria which are suitable in case of p > n are introduced and compared with a variety of penalized likelihood methods, in particular SLOPE and SLOBE. The focus is on methods which control the FDR in terms of model identification. Theoretical results are provided both with respect to model identification and prediction and various simulation results are presented which illustrate the performance of the different methods in different situations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
