Diagnostic Tool for Out-of-Sample Model Evaluation
Ludvig Hult, Dave Zachariah, Petre Stoica

TL;DR
This paper introduces a simple, interpretable diagnostic tool that uses finite calibration data to evaluate out-of-sample model performance, providing guarantees and aiding in model selection and hyper-parameter tuning.
Contribution
It proposes a novel, easy-to-compute diagnostic method with finite-sample guarantees for assessing out-of-sample losses under weak assumptions.
Findings
Quantifies the impact of distribution shifts.
Assists in regression analysis.
Supports model selection and hyper-parameter tuning.
Abstract
Assessment of model fitness is a key part of machine learning. The standard paradigm is to learn models by minimizing a chosen loss function averaged over training data, with the aim of achieving small losses on future data. In this paper, we consider the use of a finite calibration data set to characterize the future, out-of-sample losses of a model. We propose a simple model diagnostic tool that provides finite-sample guarantees under weak assumptions. The tool is simple to compute and to interpret. Several numerical experiments are presented to show how the proposed method quantifies the impact of distribution shifts, aids the analysis of regression, and enables model selection as well as hyper-parameter tuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Gaussian Processes and Bayesian Inference · Model Reduction and Neural Networks
