Evaluating machine learning models in non-standard settings: An overview and new findings
Roman Hornung, Malte Nalenz, Lennart Schneider, Andreas Bender, Ludwig, Bothmann, Bernd Bischl, Thomas Augustin, Anne-Laure Boulesteix

TL;DR
This paper reviews and introduces guidelines for estimating the generalization error of machine learning models in non-standard data settings, emphasizing the need for tailored resampling methods to avoid bias.
Contribution
It provides a comprehensive overview of GE estimation techniques in non-standard settings and presents new simulation results validating the necessity of tailored methods.
Findings
Standard resampling often yields biased GE estimates in non-standard settings
Tailored GE estimation methods improve accuracy in clustered, spatial, and hierarchical data
Simulation studies confirm the importance of setting-specific resampling approaches
Abstract
Estimating the generalization error (GE) of machine learning models is fundamental, with resampling methods being the most common approach. However, in non-standard settings, particularly those where observations are not independently and identically distributed, resampling using simple random data divisions may lead to biased GE estimates. This paper strives to present well-grounded guidelines for GE estimation in various such non-standard settings: clustered data, spatial data, unequal sampling probabilities, concept drift, and hierarchically structured outcomes. Our overview combines well-established methodologies with other existing methods that, to our knowledge, have not been frequently considered in these particular settings. A unifying principle among these techniques is that the test data used in each iteration of the resampling procedure should reflect the new observations to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Statistical Methods and Bayesian Inference
MethodsSparse Evolutionary Training
