Sample Size Planning for Classification Models
Claudia Beleites, Ute Neugebauer, Thomas Bocklitz, Christoph, Krafft, J\"urgen Popp

TL;DR
This paper explores how to determine the appropriate sample sizes for training and validating classification models in biospectroscopy, emphasizing the importance of sufficient test samples to reliably assess performance and compare classifiers.
Contribution
It provides methods to calculate necessary sample sizes for classifier validation and comparison, especially in small sample size scenarios common in biospectroscopy.
Findings
75-100 samples needed for reliable classifier testing
Large test sample sizes often required to demonstrate classifier superiority
Learning curves can be obscured by test uncertainty in small samples
Abstract
In biospectroscopy, suitably annotated and statistically independent samples (e. g. patients, batches, etc.) for classifier training and testing are scarce and costly. Learning curves show the model performance as function of the training sample size and can help to determine the sample size needed to train good classifiers. However, building a good model is actually not enough: the performance must also be proven. We discuss learning curves for typical small sample size situations with 5 - 25 independent samples per class. Although the classification models achieve acceptable performance, the learning curve can be completely masked by the random testing uncertainty due to the equally limited test sample size. In consequence, we determine test sample sizes necessary to achieve reasonable precision in the validation and find that 75 - 100 samples will usually be needed to test a good but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
