Active Learning at the ImageNet Scale
Zeyad Ali Sami Emam, Hong-Min Chu, Ping-Yeh Chiang, Wojciech Czaja,, Richard Leapman, Micah Goldblum, Tom Goldstein

TL;DR
This paper investigates the effectiveness of active learning combined with self-supervised pretraining on ImageNet, revealing challenges with class imbalance and proposing a simple, scalable balanced selection algorithm that outperforms random sampling.
Contribution
The study demonstrates that existing active learning methods underperform on ImageNet due to class imbalance and introduces BASE, a new scalable algorithm that improves sample selection.
Findings
Existing AL algorithms fail to outperform random sampling on ImageNet.
Class imbalance affects the performance of AL algorithms.
BASE consistently outperforms random sampling by selecting more balanced samples.
Abstract
Active learning (AL) algorithms aim to identify an optimal subset of data for annotation, such that deep neural networks (DNN) can achieve better performance when trained on this labeled subset. AL is especially impactful in industrial scale settings where data labeling costs are high and practitioners use every tool at their disposal to improve model performance. The recent success of self-supervised pretraining (SSP) highlights the importance of harnessing abundant unlabeled data to boost model performance. By combining AL with SSP, we can make use of unlabeled data while simultaneously labeling and training on particularly informative samples. In this work, we study a combination of AL and SSP on ImageNet. We find that performance on small toy datasets -- the typical benchmark setting in the literature -- is not representative of performance on ImageNet due to the class imbalanced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
MethodsBalanced Selection
