Exploiting Diversity of Unlabeled Data for Label-Efficient Semi-Supervised Active Learning
Felix Buchert, Nassir Navab, Seong Tae Kim

TL;DR
This paper introduces a novel diversity-based initial dataset selection and query strategy for semi-supervised active learning, leveraging self-supervised and consistency-based embeddings to improve sample informativeness, leading to better performance on benchmark datasets.
Contribution
It proposes new diversity-based algorithms for initial dataset selection and active learning queries that incorporate self-supervised and consistency-based embeddings.
Findings
Achieves superior results on CIFAR-10 and Caltech-101 datasets.
Utilizes diversity of unlabeled data to enhance sample selection.
Improves label efficiency in semi-supervised active learning.
Abstract
The availability of large labeled datasets is the key component for the success of deep learning. However, annotating labels on large datasets is generally time-consuming and expensive. Active learning is a research area that addresses the issues of expensive labeling by selecting the most important samples for labeling. Diversity-based sampling algorithms are known as integral components of representation-based approaches for active learning. In this paper, we introduce a new diversity-based initial dataset selection algorithm to select the most informative set of samples for initial labeling in the active learning setting. Self-supervised representation learning is used to consider the diversity of samples in the initial dataset selection algorithm. Also, we propose a novel active learning query strategy, which uses diversity-based sampling on consistency-based embeddings. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Text and Document Classification Technologies
