Pruning the Unlabeled Data to Improve Semi-Supervised Learning

Guy Hacohen; Daphna Weinshall

arXiv:2308.14058·cs.LG·August 29, 2023·2 cites

Pruning the Unlabeled Data to Improve Semi-Supervised Learning

Guy Hacohen, Daphna Weinshall

PDF

Open Access

TL;DR

This paper introduces PruneSSL, a method that improves semi-supervised learning by selectively removing less separable unlabeled data, leading to better performance despite using less data.

Contribution

PruneSSL is a novel data pruning technique that enhances SSL by increasing data separability, resulting in state-of-the-art image classification performance.

Findings

01

PruneSSL improves SSL accuracy across multiple datasets.

02

Selective data removal enhances model performance.

03

State-of-the-art results achieved with less data.

Abstract

In the domain of semi-supervised learning (SSL), the conventional approach involves training a learner with a limited amount of labeled data alongside a substantial volume of unlabeled data, both drawn from the same underlying distribution. However, for deep learning models, this standard practice may not yield optimal results. In this research, we propose an alternative perspective, suggesting that distributions that are more readily separable could offer superior benefits to the learner as compared to the original distribution. To achieve this, we present PruneSSL, a practical technique for selectively removing examples from the original unlabeled dataset to enhance its separability. We present an empirical study, showing that although PruneSSL reduces the quantity of available training data for the learner, it significantly improves the performance of various competitive SSL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications