An analysis of over-sampling labeled data in semi-supervised learning   with FixMatch

Miquel Mart\'i i Rabad\'an; Sebastian Bujwid; Alessandro Pieropan,; Hossein Azizpour; Atsuto Maki

arXiv:2201.00604·cs.LG·April 11, 2022

An analysis of over-sampling labeled data in semi-supervised learning with FixMatch

Miquel Mart\'i i Rabad\'an, Sebastian Bujwid, Alessandro Pieropan,, Hossein Azizpour, Atsuto Maki

PDF

1 Repo

TL;DR

This paper investigates whether over-sampling labeled data in semi-supervised learning with FixMatch improves performance, finding it beneficial early in training but less so later, and emphasizing the importance of true labels to prevent errors.

Contribution

It provides a comparative analysis of over-sampling versus uniform sampling of labeled data in semi-supervised learning, highlighting the dynamics over training stages.

Findings

01

Over-sampling labeled data improves early training performance.

02

Uniform sampling leads to a performance drop, especially with fewer labels.

03

Maintaining true labels helps prevent confirmation errors from pseudo-labels.

Abstract

Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches. This paper studies whether this common practice improves learning and how. We compare it to an alternative setting where each mini-batch is uniformly sampled from all the training data, labeled or not, which greatly reduces direct supervision from true labels in typical low-label regimes. However, this simpler setting can also be seen as more general and even necessary in multi-task problems where over-sampling labeled data would become intractable. Our experiments on semi-supervised CIFAR-10 image classification using FixMatch show a performance drop when using the uniform sampling approach which diminishes when the amount of labeled data or the training time increases. Further, we analyse the training dynamics to understand how over-sampling of labeled data compares to uniform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

miquelmarti/DenseFixMatch
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFixMatch