Confidence Interval Estimation of Predictive Performance in the Context of AutoML
Konstantinos Paraschakis, Andrea Castellani, Giorgos Borboudakis,, Ioannis Tsamardinos

TL;DR
This paper evaluates methods for estimating confidence intervals of predictive performance in AutoML, addressing the challenge of bias due to multiple pipeline selection, and introduces a more efficient variant of an existing method.
Contribution
It provides a comprehensive comparison of 9 CI estimation methods in AutoML, including a new efficient variant of BBC called BBC-F, extending analysis to small and imbalanced datasets.
Findings
BBC-F and BBC outperform other methods in coverage and tightness
BBC-F is more computationally efficient than the original BBC
The study extends CI estimation evaluation to small-sample and imbalanced datasets.
Abstract
Any supervised machine learning analysis is required to provide an estimate of the out-of-sample predictive performance. However, it is imperative to also provide a quantification of the uncertainty of this performance in the form of a confidence or credible interval (CI) and not just a point estimate. In an AutoML setting, estimating the CI is challenging due to the ``winner's curse", i.e., the bias of estimation due to cross-validating several machine learning pipelines and selecting the winning one. In this work, we perform a comparative evaluation of 9 state-of-the-art methods and variants in CI estimation in an AutoML setting on a corpus of real and simulated datasets. The methods are compared in terms of inclusion percentage (does a 95\% CI include the true performance at least 95\% of the time), CI tightness (tighter CIs are preferable as being more informative), and execution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
