Bayesian Variable Selection in a Million Dimensions
Martin Jankowiak

TL;DR
This paper introduces a scalable MCMC algorithm for Bayesian variable selection that efficiently handles extremely high-dimensional data and extends to generalized linear models, demonstrating effectiveness on biological datasets.
Contribution
The paper presents a novel sublinear-cost MCMC scheme for Bayesian variable selection in very high dimensions and extends it to count data models like binomial and negative binomial regression.
Findings
Efficient variable selection in million-dimensional datasets.
Successful application to genomic data in biology.
Extension to generalized linear models for count data.
Abstract
Bayesian variable selection is a powerful tool for data analysis, as it offers a principled method for variable selection that accounts for prior information and uncertainty. However, wider adoption of Bayesian variable selection has been hampered by computational challenges, especially in difficult regimes with a large number of covariates P or non-conjugate likelihoods. To scale to the large P regime we introduce an efficient MCMC scheme whose cost per iteration is sublinear in P. In addition we show how this scheme can be extended to generalized linear models for count data, which are prevalent in biology, ecology, economics, and beyond. In particular we design efficient algorithms for variable selection in binomial and negative binomial regression, which includes logistic regression as a special case. In experiments we demonstrate the effectiveness of our methods, including on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference · Analytical Chemistry and Chromatography
MethodsLogistic Regression
