Making Early Predictions of the Accuracy of Machine Learning   Applications

J. E. Smith; P. Caleb-Solly; M. A. Tahir; D. Sannen; H. van-Brussel

arXiv:1212.1100·cs.LG·December 6, 2012·1 cites

Making Early Predictions of the Accuracy of Machine Learning Applications

J. E. Smith, P. Caleb-Solly, M. A. Tahir, D. Sannen, H. van-Brussel

PDF

Open Access

TL;DR

This paper develops models to predict the potential improvement in machine learning accuracy from additional training data, enabling early decisions on whether further training is worthwhile.

Contribution

It introduces a novel approach to predict bias, variance, and total error after limited training, aiding early assessment of machine learning model performance.

Findings

01

Predictions highly correlate with actual accuracy after more training.

02

Models generalize well to unseen algorithms and datasets.

03

Accurately estimates upper bounds for ensemble classifier accuracy.

Abstract

The accuracy of machine learning systems is a widely studied research topic. Established techniques such as cross-validation predict the accuracy on unseen data of the classifier produced by applying a given learning method to a given training data set. However, they do not predict whether incurring the cost of obtaining more data and undergoing further training will lead to higher accuracy. In this paper we investigate techniques for making such early predictions. We note that when a machine learning algorithm is presented with a training set the classifier produced, and hence its error, will depend on the characteristics of the algorithm, on training set's size, and also on its specific composition. In particular we hypothesise that if a number of classifiers are produced, and their observed error is decomposed into bias and variance terms, then although these components may behave…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Neural Networks and Applications · Data Mining Algorithms and Applications