Handling Factors in Variable Selection Problems
Gonzalo Garcia-Donato, Rui Paulo

TL;DR
This paper addresses the challenge of variable selection involving both factors and numerical variables, proposing a Bayesian approach that overcomes coding dependencies and demonstrates its application in a childhood obesity study.
Contribution
It introduces a novel Bayesian method for variable selection with factors that is independent of dummy coding schemes, improving robustness and interpretability.
Findings
The proposed method reduces sensitivity to factor coding schemes.
Application to childhood obesity data illustrates practical effectiveness.
The approach extends standard variable selection techniques to handle factors more reliably.
Abstract
Factors are categorical variables, and the values which these variables assume are called levels. In this paper, we consider the variable selection problem where the set of potential predictors contains both factors and numerical variables. Formally, this problem is a particular case of the standard variable selection problem where factors are coded using dummy variables. As such, the Bayesian solution would be straightforward and, possibly because of this, the problem, despite its importance, has not received much attention in the literature. Nevertheless, we show that this perception is illusory and that in fact several inputs like the assignment of prior probabilities over the model space or the parameterization adopted for factors may have a large (and difficult to anticipate) impact on the results. We provide a solution to these issues that extends the proposals in the standard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Statistical Methods and Inference · Fuzzy Systems and Optimization
