TL;DR
This empirical study compares various penalized regression methods across numerous simulated and semi-synthetic scenarios to evaluate their effectiveness in prediction, variable selection, and ranking in high-dimensional settings.
Contribution
It provides a comprehensive empirical comparison of popular high-dimensional regression methods, highlighting their strengths and limitations in practical finite-sample scenarios.
Findings
No single method dominates across all scenarios.
Performance varies significantly depending on data characteristics.
Recommendations are provided for method selection based on specific goals and data features.
Abstract
Penalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well-developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users. In this paper we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 2,300 data-generating scenarios, including both synthetic and semi-synthetic data (real covariates and simulated responses), allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
