TL;DR
This paper investigates whether over-sampling labeled data in semi-supervised learning with FixMatch improves performance, finding it beneficial early in training but less so later, and emphasizing the importance of true labels to prevent errors.
Contribution
It provides a comparative analysis of over-sampling versus uniform sampling of labeled data in semi-supervised learning, highlighting the dynamics over training stages.
Findings
Over-sampling labeled data improves early training performance.
Uniform sampling leads to a performance drop, especially with fewer labels.
Maintaining true labels helps prevent confirmation errors from pseudo-labels.
Abstract
Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches. This paper studies whether this common practice improves learning and how. We compare it to an alternative setting where each mini-batch is uniformly sampled from all the training data, labeled or not, which greatly reduces direct supervision from true labels in typical low-label regimes. However, this simpler setting can also be seen as more general and even necessary in multi-task problems where over-sampling labeled data would become intractable. Our experiments on semi-supervised CIFAR-10 image classification using FixMatch show a performance drop when using the uniform sampling approach which diminishes when the amount of labeled data or the training time increases. Further, we analyse the training dynamics to understand how over-sampling of labeled data compares to uniform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFixMatch
