Comparing methods addressing multi-collinearity when developing prediction models
Artuur M. Leeuwenberg, Maarten van Smeden, Johannes A. Langendijk,, Arjen van der Schaaf, Murielle E. Mauer, Karel G.M. Moons, Johannes B., Reitsma, Ewoud Schuit

TL;DR
This paper compares various statistical methods like shrinkage, dimensionality reduction, and constrained optimization to handle multicollinearity in clinical prediction models, highlighting their effects on predictor stability through simulations.
Contribution
It provides a systematic comparison of methods addressing multicollinearity in prediction models, emphasizing their impact on predictor stability rather than predictive accuracy.
Findings
No effect of collinearity on predictive outcomes in simulations
Collinearity negatively affects predictor stability across methods
Strong predictor selection methods are more impacted by collinearity
Abstract
Clinical prediction models are developed widely across medical disciplines. When predictors in such models are highly collinear, unexpected or spurious predictor-outcome associations may occur, thereby potentially reducing face-validity and explainability of the prediction model. Collinearity can be dealt with by exclusion of collinear predictors, but when there is no a priori motivation (besides collinearity) to include or exclude specific predictors, such an approach is arbitrary and possibly inappropriate. We compare different methods to address collinearity, including shrinkage, dimensionality reduction, and constrained optimization. The effectiveness of these methods is illustrated via simulations. In the conducted simulations, no effect of collinearity was observed on predictive outcomes. However, a negative effect of collinearity on the stability of predictor selection was found,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Health Systems, Economic Evaluations, Quality of Life · Statistical Methods and Inference
