High-dimensional regression in practice: an empirical study of   finite-sample prediction, variable selection and ranking

Fan Wang; Sach Mukherjee; Sylvia Richardson; Steven M. Hill

arXiv:1808.00723·stat.ME·January 29, 2020·Stat. Comput.

High-dimensional regression in practice: an empirical study of finite-sample prediction, variable selection and ranking

Fan Wang, Sach Mukherjee, Sylvia Richardson, Steven M. Hill

PDF

1 Repo

TL;DR

This empirical study compares various penalized regression methods across numerous simulated and semi-synthetic scenarios to evaluate their effectiveness in prediction, variable selection, and ranking in high-dimensional settings.

Contribution

It provides a comprehensive empirical comparison of popular high-dimensional regression methods, highlighting their strengths and limitations in practical finite-sample scenarios.

Findings

01

No single method dominates across all scenarios.

02

Performance varies significantly depending on data characteristics.

03

Recommendations are provided for method selection based on specific goals and data features.

Abstract

Penalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well-developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users. In this paper we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 2,300 data-generating scenarios, including both synthetic and semi-synthetic data (real covariates and simulated responses), allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fw307/high_dimensional_regression_comparison
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.