Evaluating machine learning models in non-standard settings: An overview   and new findings

Roman Hornung; Malte Nalenz; Lennart Schneider; Andreas Bender; Ludwig; Bothmann; Bernd Bischl; Thomas Augustin; Anne-Laure Boulesteix

arXiv:2310.15108·stat.ML·October 24, 2023·1 cites

Evaluating machine learning models in non-standard settings: An overview and new findings

Roman Hornung, Malte Nalenz, Lennart Schneider, Andreas Bender, Ludwig, Bothmann, Bernd Bischl, Thomas Augustin, Anne-Laure Boulesteix

PDF

Open Access

TL;DR

This paper reviews and introduces guidelines for estimating the generalization error of machine learning models in non-standard data settings, emphasizing the need for tailored resampling methods to avoid bias.

Contribution

It provides a comprehensive overview of GE estimation techniques in non-standard settings and presents new simulation results validating the necessity of tailored methods.

Findings

01

Standard resampling often yields biased GE estimates in non-standard settings

02

Tailored GE estimation methods improve accuracy in clustered, spatial, and hierarchical data

03

Simulation studies confirm the importance of setting-specific resampling approaches

Abstract

Estimating the generalization error (GE) of machine learning models is fundamental, with resampling methods being the most common approach. However, in non-standard settings, particularly those where observations are not independently and identically distributed, resampling using simple random data divisions may lead to biased GE estimates. This paper strives to present well-grounded guidelines for GE estimation in various such non-standard settings: clustered data, spatial data, unequal sampling probabilities, concept drift, and hierarchically structured outcomes. Our overview combines well-established methodologies with other existing methods that, to our knowledge, have not been frequently considered in these particular settings. A unifying principle among these techniques is that the test data used in each iteration of the resampling procedure should reflect the new observations to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Machine Learning and Data Classification · Statistical Methods and Bayesian Inference

MethodsSparse Evolutionary Training