Efficient estimation and correction of selection-induced bias with order statistics
Yann McLatchie, Aki Vehtari

TL;DR
This paper presents a fast, order statistics-based method to estimate and correct bias introduced by model selection, especially in complex, multi-step procedures, offering a practical alternative to computationally intensive techniques.
Contribution
It introduces an efficient, theoretically grounded approach for bias correction in model selection, with diagnostic tools and implementation code to enhance practical usability.
Findings
Reliable bias estimation demonstrated in numerical experiments
Effective correction of over-fitting in forward search
Light-weight alternative to nested cross-validation and bootstrap
Abstract
Model selection aims to identify a sufficiently well performing model that is possibly simpler than the most complex model among a pool of candidates. However, the decision-making process itself can inadvertently introduce non-negligible bias when the cross-validation estimates of predictive performance are marred by excessive noise. In finite data regimes, cross-validated estimates can encourage the statistician to select one model over another when it is not actually better for future data. While this bias remains negligible in the case of few models, when the pool of candidates grows, and model selection decisions are compounded (as in step-wise selection), the expected magnitude of selection-induced bias is likely to grow too. This paper introduces an efficient approach to estimate and correct selection-induced bias based on order statistics. Numerical experiments demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Probabilistic and Robust Engineering Design · Statistical Methods and Bayesian Inference
