A Scalable Empirical Bayes Approach to Variable Selection
Haim Y. Bar, James G. Booth, Martin T. Wells

TL;DR
This paper introduces a scalable empirical Bayes method for variable selection in high-dimensional linear regression, efficiently identifying relevant predictors when the number of variables exceeds observations.
Contribution
It proposes a novel three-component mixture model with an EM algorithm for fast, scalable variable selection in large p, small n settings, improving computational efficiency.
Findings
Efficient variable selection in high-dimensional data.
Faster convergence compared to simulation-based methods.
Applicable to large-scale linear regression problems.
Abstract
We develop a model-based empirical Bayes approach to variable selection problems in which the number of predictors is very large, possibly much larger than the number of responses (the so-called 'large p, small n' problem). We consider the multiple linear regression setting, where the response is assumed to be a continuous variable and it is a linear function of the predictors plus error. The explanatory variables in the linear model can have a positive effect on the response, a negative effect, or no effect. We model the effects of the linear predictors as a three-component mixture in which a key assumption is that only a small (unknown) fraction of the candidate predictors have a non-zero effect on the response variable. By treating the coefficients as random effects we develop an approach that is computationally efficient because the number of parameters that have to be estimated is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference
