Comparing Variable Selection and Model Averaging Methods for Logistic Regression
Nikola Sekulovski, Franti\v{s}ek Barto\v{s}, Don van den Bergh, Giuseppe Arena, Henrik R. Godmann, Vipasha Goyal, Julius M. Pfadt, Maarten Marsman, and Adrian E. Raftery

TL;DR
This study compares 28 methods for variable selection and model averaging in logistic regression, highlighting the strengths of Bayesian model averaging without separation and penalized likelihood methods like LASSO when separation occurs.
Contribution
It provides a comprehensive simulation-based comparison of existing methods, offering practical guidance for addressing model uncertainty in logistic regression.
Findings
BMA with g-priors performs best without separation.
LASSO is most stable when separation occurs.
BMA with EB-local prior is competitive in both scenarios.
Abstract
Model uncertainty is a central challenge in statistical models for binary outcomes such as logistic regression, arising when it is unclear which predictors should be included in the model. Many methods have been proposed to address this issue for logistic regression, but their relative performance under realistic conditions remains poorly understood. We therefore conducted a preregistered, simulation-based comparison of 28 established methods for variable selection and inference under model uncertainty, using 11 empirical datasets spanning a range of sample sizes and number of predictors, in cases both with and without separation. We found that Bayesian model averaging (BMA) methods based on g-priors, particularly g = max(n, p^2), show the strongest overall performance when separation is absent. When separation occurs, penalized likelihood approaches, especially the LASSO, provide the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
