Identifying Wrongly Predicted Samples: A Method for Active Learning
Rahaf Aljundi, Nikolay Chumerin, Daniel Olmeda Reino

TL;DR
This paper introduces a novel sample selection method for active learning that identifies wrongly predicted samples by assessing their impact on generalization error, improving data annotation efficiency especially in imbalanced datasets.
Contribution
The authors propose a new criterion for active learning that moves beyond uncertainty, effectively identifying wrongly predicted samples without retraining, and demonstrate state-of-the-art results.
Findings
Achieves state-of-the-art performance on active learning benchmarks.
Effectively identifies wrongly predicted samples in imbalanced datasets.
Method is simple, model-agnostic, and efficient.
Abstract
State-of-the-art machine learning models require access to significant amount of annotated data in order to achieve the desired level of performance. While unlabelled data can be largely available and even abundant, annotation process can be quite expensive and limiting. Under the assumption that some samples are more important for a given task than others, active learning targets the problem of identifying the most informative samples that one should acquire annotations for. Instead of the conventional reliance on model uncertainty as a proxy to leverage new unknown labels, in this work we propose a simple sample selection criterion that moves beyond uncertainty. By first accepting the model prediction and then judging its effect on the generalization error, we can better identify wrongly predicted samples. We further present an approximation to our criterion that is very efficient and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Identifying Wrongly Predicted Samples: A Method for Active Learning· youtube
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Imbalanced Data Classification Techniques
