Performance Prediction Under Dataset Shift
Simona Maggio, Victor Bouvier, L\'eo Dreyfus-Schmidt

TL;DR
This paper evaluates how well performance prediction models generalize to unseen domain shifts, finding that error predictors outperform shift detection metrics and proposing an uncertainty estimation method for reliable predictions.
Contribution
It demonstrates the limitations of shift detection metrics and introduces error predictors with uncertainty estimation for better performance prediction under dataset shift.
Findings
Error predictors outperform shift detection metrics on unseen domains.
Uncertainty estimation improves the reliability of performance predictions.
Models trained on synthetic shifts generalize better to real domain changes.
Abstract
ML models deployed in production often have to face unknown domain changes, fundamentally different from their training settings. Performance prediction models carry out the crucial task of measuring the impact of these changes on model performance. We study the generalization capabilities of various performance prediction models to new domains by learning on generated synthetic perturbations. Empirical validation on a benchmark of ten tabular datasets shows that models based upon state-of-the-art shift detection metrics are not expressive enough to generalize to unseen domains, while Error Predictors bring a consistent improvement in performance prediction under shift. We additionally propose a natural and effortless uncertainty estimation of the predicted accuracy that ensures reliable use of performance predictors. Our implementation is available at https:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Context-Aware Activity Recognition Systems · Machine Learning and Data Classification
