Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance
Pawel Pukowski, Haiping Lu

TL;DR
This paper examines how the distribution of hard samples within datasets affects model evaluation, revealing limitations of test accuracy and proposing a benchmarking procedure for hard sample identification methods.
Contribution
It introduces the in-class data imbalance problem and proposes a benchmarking procedure to compare methods for identifying hard samples, highlighting evaluation limitations.
Findings
Hard sample distribution impacts perceived model difficulty
Two generalization pathways identified: easy and hard samples
Benchmarking procedure for hard sample identification methods
Abstract
In the AutoML domain, test accuracy is heralded as the quintessential metric for evaluating model efficacy, underpinning a wide array of applications from neural architecture search to hyperparameter optimization. However, the reliability of test accuracy as the primary performance metric has been called into question, notably through research highlighting how label noise can obscure the true ranking of state-of-the-art models. We venture beyond, along another perspective where the existence of hard samples within datasets casts further doubt on the generalization capabilities inferred from test accuracy alone. Our investigation reveals that the distribution of hard samples between training and test sets affects the difficulty levels of those sets, thereby influencing the perceived generalization capability of models. We unveil two distinct generalization pathways-toward easy and hard…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Financial Distress and Bankruptcy Prediction
