High-dimensional regression and variable selection using CAR scores
Verena Zuber, Korbinian Strimmer

TL;DR
The paper introduces the CAR score, a new variable ranking criterion for high-dimensional linear regression that improves variable selection and prediction accuracy, especially in genomic data analysis.
Contribution
It proposes the CAR score, a novel variable importance measure based on Mahalanobis-decorrelation, with demonstrated effectiveness over existing methods.
Findings
CAR scores outperform elastic net and boosting in simulations
Effective in selecting relevant variables in genomic data
Provides better prediction errors and true/false positive rates
Abstract
Variable selection is a difficult problem that is particularly challenging in the analysis of high-dimensional genomic data. Here, we introduce the CAR score, a novel and highly effective criterion for variable ranking in linear regression based on Mahalanobis-decorrelation of the explanatory variables. The CAR score provides a canonical ordering that encourages grouping of correlated predictors and down-weights antagonistic variables. It decomposes the proportion of variance explained and it is an intermediate between marginal correlation and the standardized regression coefficient. As a population quantity, any preferred inference scheme can be applied for its estimation. Using simulations we demonstrate that variable selection by CAR scores is very effective and yields prediction errors and true and false positive rates that compare favorably with modern regression techniques such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
