Exploiting Diversity of Unlabeled Data for Label-Efficient   Semi-Supervised Active Learning

Felix Buchert; Nassir Navab; Seong Tae Kim

arXiv:2207.12302·cs.CV·July 26, 2022

Exploiting Diversity of Unlabeled Data for Label-Efficient Semi-Supervised Active Learning

Felix Buchert, Nassir Navab, Seong Tae Kim

PDF

Open Access

TL;DR

This paper introduces a novel diversity-based initial dataset selection and query strategy for semi-supervised active learning, leveraging self-supervised and consistency-based embeddings to improve sample informativeness, leading to better performance on benchmark datasets.

Contribution

It proposes new diversity-based algorithms for initial dataset selection and active learning queries that incorporate self-supervised and consistency-based embeddings.

Findings

01

Achieves superior results on CIFAR-10 and Caltech-101 datasets.

02

Utilizes diversity of unlabeled data to enhance sample selection.

03

Improves label efficiency in semi-supervised active learning.

Abstract

The availability of large labeled datasets is the key component for the success of deep learning. However, annotating labels on large datasets is generally time-consuming and expensive. Active learning is a research area that addresses the issues of expensive labeling by selecting the most important samples for labeling. Diversity-based sampling algorithms are known as integral components of representation-based approaches for active learning. In this paper, we introduce a new diversity-based initial dataset selection algorithm to select the most informative set of samples for initial labeling in the active learning setting. Self-supervised representation learning is used to consider the diversity of samples in the initial dataset selection algorithm. Also, we propose a novel active learning query strategy, which uses diversity-based sampling on consistency-based embeddings. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Text and Document Classification Technologies