TL;DR
This paper demonstrates that in low-budget active learning scenarios, a simple K-means clustering approach using features from self-supervised learning can outperform more complex methods, providing a practical baseline.
Contribution
The study introduces a straightforward, effective baseline for low-budget active learning by leveraging self-supervised features and K-means clustering, challenging complex query strategies.
Findings
K-means outperforms state-of-the-art active learning methods at very low budgets.
Self-supervised features are effective for sampling in low-label regimes.
Simple clustering can serve as a strong baseline for low-budget active learning.
Abstract
Active learning focuses on choosing a subset of unlabeled data to be labeled. However, most such methods assume that a large subset of the data can be annotated. We are interested in low-budget active learning where only a small subset (e.g., 0.2% of ImageNet) can be annotated. Instead of proposing a new query strategy to iteratively sample batches of unlabeled data given an initial pool, we learn rich features by an off-the-shelf self-supervised learning method only once, and then study the effectiveness of different sampling strategies given a low labeling budget on a variety of datasets including ImageNet. We show that although the state-of-the-art active learning methods work well given a large labeling budget, a simple K-means clustering algorithm can outperform them on low budgets. We believe this method can be used as a simple baseline for low-budget active learning on image…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodsk-Means Clustering
