Post-Selection Confidence Bounds for Prediction Performance
Pascal Rink, Werner Brannath

TL;DR
This paper introduces a method to compute valid lower confidence bounds for multiple models selected based on their prediction performance, addressing the challenge of model selection and evaluation in machine learning.
Contribution
It proposes a universal algorithm using bootstrap tilting and maxT correction to provide confidence bounds for multiple models simultaneously, improving reliability especially with small sample sizes.
Findings
Bounds are at least as good as standard methods
Reliably reach nominal coverage probability
Better performance in small sample scenarios
Abstract
In machine learning, the selection of a promising model from a potentially large number of competing models and the assessment of its generalization performance are critical tasks that need careful consideration. Typically, model selection and evaluation are strictly separated endeavors, splitting the sample at hand into a training, validation, and evaluation set, and only compute a single confidence interval for the prediction performance of the final selected model. We however propose an algorithm how to compute valid lower confidence bounds for multiple models that have been selected based on their prediction performances in the evaluation set by interpreting the selection problem as a simultaneous inference problem. We use bootstrap tilting and a maxT-type multiplicity correction. The approach is universally applicable for any combination of prediction models, any model selection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Machine Learning and Algorithms
