An imputation method for estimating the learning curve in classification problems
Eric B. Laber, Kerby Shedden, Yang Yang

TL;DR
This paper introduces a novel imputation-based method for estimating learning curves in classification tasks, enabling better assessment of potential gains from additional training data.
Contribution
The paper presents a new imputation approach for estimating learning curves, demonstrating improved accuracy over existing methods in classification problems.
Findings
Accurate learning curve estimation for approximately four times the current training data size.
Imputation method outperforms parameterization-based approaches.
Application to disease progression prediction demonstrates practical utility.
Abstract
The learning curve expresses the error rate of a predictive modeling procedure as a function of the sample size of the training dataset. It typically is a decreasing, convex function with a positive limiting value. An estimate of the learning curve can be used to assess whether a modeling procedure should be expected to become substantially more accurate if additional training data become available. This article proposes a new procedure for estimating learning curves using imputation. We focus on classification, although the idea is applicable to other predictive modeling settings. Simulation studies indicate that the learning curve can be estimated with useful accuracy for a roughly four-fold increase in the size of the training set relative to the available data, and that the proposed imputation approach outperforms an alternative estimation approach based on parameterizing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChronic Lymphocytic Leukemia Research
