Linear Regression, Covariate Selection and the Failure of Modelling
Laurie Davies

TL;DR
This paper demonstrates that all traditional model-based covariate selection methods in linear regression fail, and introduces a model-free Gaussian covariate procedure that is more reliable, efficient, and scalable for large datasets.
Contribution
The paper critically analyzes the failure of existing covariate selection methods and proposes a novel, model-free Gaussian covariate procedure that outperforms traditional approaches in accuracy and computational efficiency.
Findings
Model-based procedures failed on multiple datasets.
The Gaussian covariate procedure is exact, valid, and scalable.
It outperforms all other covariate selection methods.
Abstract
It is argued that all model based approaches to the selection of covariates in linear regression have failed. This applies to frequentist approaches based on P-values and to Bayesian approaches although for different reasons. In the first part of the paper 13 model based procedures are compared to the model-free Gaussian covariate procedure in terms of the covariates selected and the time required. The comparison is based on seven data sets and three simulations. There is nothing special about these data sets which are often used as examples in the literature. All the model based procedures failed. In the second part of the paper it is argued that the cause of this failure is the very use of a model. If the model involves all the available covariates standard P-values can be used. The use of P-values in this situation is quite straightforward. As soon as the model specifies only some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference
