The effect of collinearity and sample size on linear regression results: a simulation study
Stephanie CC van der Lubbe, Jose M Valderas, Evangelos Kontopantelis

TL;DR
This simulation study investigates how collinearity and sample size jointly influence linear regression accuracy, emphasizing that VIF thresholds should be context-dependent and highlighting the risks of ignoring sample size and bias.
Contribution
It provides a comprehensive simulation-based analysis of collinearity effects across various sample sizes, offering practical guidance for interpreting VIFs and understanding bias amplification.
Findings
Collinearity reduces precision in small samples but has limited impact on coverage in large samples.
Bias amplification due to collinearity is significant under model misspecification.
VIF thresholds should be adapted based on sample size and study context.
Abstract
Background: Multicollinearity inflates the variance of OLS coefficients, widening confidence intervals and reducing inferential reliability. Yet fixed variance inflation factor (VIF) cut-offs are often applied uniformly across studies with very different sample sizes, even though collinearity is a finite-sample problem. We quantify how collinearity and sample size jointly affect linear regression performance and provide practical guidance for interpreting VIFs. Methods: We simulated data across sample sizes N=100-100,000 and collinearity levels VIF=1-50. For each scenario we generated 1,000 datasets, fitted OLS models, and assessed coverage, mean absolute error (MAE), bias, traditional power (CI excludes 0), and precision assurance (probability the 95% CI lies within a prespecified margin around the true effect). We also evaluated a biased, misspecified setting by omitting a relevant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Advanced Causal Inference Techniques · Statistical Methods in Epidemiology
