A Simple Baseline for Low-Budget Active Learning

Kossar Pourahmadi; Parsa Nooralinejad; Hamed Pirsiavash

arXiv:2110.12033·cs.CV·April 4, 2022

A Simple Baseline for Low-Budget Active Learning

Kossar Pourahmadi, Parsa Nooralinejad, Hamed Pirsiavash

PDF

1 Repo

TL;DR

This paper demonstrates that in low-budget active learning scenarios, a simple K-means clustering approach using features from self-supervised learning can outperform more complex methods, providing a practical baseline.

Contribution

The study introduces a straightforward, effective baseline for low-budget active learning by leveraging self-supervised features and K-means clustering, challenging complex query strategies.

Findings

01

K-means outperforms state-of-the-art active learning methods at very low budgets.

02

Self-supervised features are effective for sampling in low-label regimes.

03

Simple clustering can serve as a strong baseline for low-budget active learning.

Abstract

Active learning focuses on choosing a subset of unlabeled data to be labeled. However, most such methods assume that a large subset of the data can be annotated. We are interested in low-budget active learning where only a small subset (e.g., 0.2% of ImageNet) can be annotated. Instead of proposing a new query strategy to iteratively sample batches of unlabeled data given an initial pool, we learn rich features by an off-the-shelf self-supervised learning method only once, and then study the effectiveness of different sampling strategies given a low labeling budget on a variety of datasets including ImageNet. We show that although the state-of-the-art active learning methods work well given a large labeling budget, a simple K-means clustering algorithm can outperform them on low budgets. We believe this method can be used as a simple baseline for low-budget active learning on image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ucdvision/low-budget-al
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methodsk-Means Clustering