High-dimensional regression with potential prior information on variable importance
Benjamin G. Stokell, Rajen D. Shah

TL;DR
This paper introduces a computationally efficient method for high-dimensional regression that leverages prior information on variable importance, improving model selection and performance in settings like missing data and time series.
Contribution
It proposes a simple, fast scheme for fitting a sequence of models based on variable importance orderings, with theoretical guarantees and practical effectiveness demonstrated.
Findings
Efficient model fitting with no additional computational cost for ridge regression.
Theoretical bound showing only a logarithmic penalty in model selection.
Effective application to missing data, corrupted data, and time series.
Abstract
There are a variety of settings where vague prior information may be available on the importance of predictors in high-dimensional regression settings. Examples include ordering on the variables offered by their empirical variances (which is typically discarded through standardisation), the lag of predictors when fitting autoregressive models in time series settings, or the level of missingness of the variables. Whilst such orderings may not match the true importance of variables, we argue that there is little to be lost, and potentially much to be gained, by using them. We propose a simple scheme involving fitting a sequence of models indicated by the ordering. We show that the computational cost for fitting all models when ridge regression is used is no more than for a single fit of ridge regression, and describe a strategy for Lasso regression that makes use of previous fits to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Bayesian Methods and Mixture Models
MethodsTest · Linear Regression
