Active Learning at the ImageNet Scale

Zeyad Ali Sami Emam; Hong-Min Chu; Ping-Yeh Chiang; Wojciech Czaja,; Richard Leapman; Micah Goldblum; Tom Goldstein

arXiv:2111.12880·cs.CV·November 29, 2021

Active Learning at the ImageNet Scale

Zeyad Ali Sami Emam, Hong-Min Chu, Ping-Yeh Chiang, Wojciech Czaja,, Richard Leapman, Micah Goldblum, Tom Goldstein

PDF

Open Access 1 Repo

TL;DR

This paper investigates the effectiveness of active learning combined with self-supervised pretraining on ImageNet, revealing challenges with class imbalance and proposing a simple, scalable balanced selection algorithm that outperforms random sampling.

Contribution

The study demonstrates that existing active learning methods underperform on ImageNet due to class imbalance and introduces BASE, a new scalable algorithm that improves sample selection.

Findings

01

Existing AL algorithms fail to outperform random sampling on ImageNet.

02

Class imbalance affects the performance of AL algorithms.

03

BASE consistently outperforms random sampling by selecting more balanced samples.

Abstract

Active learning (AL) algorithms aim to identify an optimal subset of data for annotation, such that deep neural networks (DNN) can achieve better performance when trained on this labeled subset. AL is especially impactful in industrial scale settings where data labeling costs are high and practitioners use every tool at their disposal to improve model performance. The recent success of self-supervised pretraining (SSP) highlights the importance of harnessing abundant unlabeled data to boost model performance. By combining AL with SSP, we can make use of unlabeled data while simultaneously labeling and training on particularly informative samples. In this work, we study a combination of AL and SSP on ImageNet. We find that performance on small toy datasets -- the typical benchmark setting in the literature -- is not representative of performance on ImageNet due to the class imbalanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zeyademam/active_learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning

MethodsBalanced Selection