Gibbs posterior for variable selection in high-dimensional classification and data mining
Wenxin Jiang, Martin A. Tanner

TL;DR
This paper introduces a Gibbs posterior approach for variable selection in high-dimensional classification, which minimizes a risk function directly and can outperform traditional Bayesian methods, with practical algorithms provided.
Contribution
It develops a novel Gibbs posterior framework for Bayesian variable selection that does not rely on probabilistic data models, suitable for high-dimensional settings.
Findings
Achieves good risk performance even when the number of variables exceeds sample size
Provides conditions for effective variable selection in high dimensions
Develops a practical MCMC algorithm for implementation
Abstract
In the popular approach of "Bayesian variable selection" (BVS), one uses prior and posterior distributions to select a subset of candidate variables to enter the model. A completely new direction will be considered here to study BVS with a Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function of practical interest (such as the classification error) and aims at minimizing a risk function without modeling the data probabilistically. This can improve the performance over the usual Bayesian approach, which depends on a probability model which may be misspecified. Conditions will be provided to achieve good risk performance, even in the presence of high dimensionality, when the number of candidate variables "" can be much larger than the sample size "." In addition, we develop a convenient Markov chain Monte Carlo algorithm to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
