Enhancing Self-Training Methods
Aswathnarayan Radhakrishnan, Jim Davis, Zachary Rabin, Benjamin Lewis,, Matthew Scherreik, Roman Ilin

TL;DR
This paper proposes multiple enhancements to self-training methods in semi-supervised learning to reduce confirmation bias, improve pseudo-label accuracy, and extend applicability to open set unlabeled data, demonstrating consistent performance gains.
Contribution
The paper introduces novel improvements to the self-training pipeline that mitigate confirmation bias and extend its use to open set data, outperforming existing methods.
Findings
Performance gains over existing self-training methods
Effective mitigation of confirmation bias
Successful extension to open set unlabeled data
Abstract
Semi-supervised learning approaches train on small sets of labeled data along with large sets of unlabeled data. Self-training is a semi-supervised teacher-student approach that often suffers from the problem of "confirmation bias" that occurs when the student model repeatedly overfits to incorrect pseudo-labels given by the teacher model for the unlabeled data. This bias impedes improvements in pseudo-label accuracy across self-training iterations, leading to unwanted saturation in model performance after just a few iterations. In this work, we describe multiple enhancements to improve the self-training pipeline to mitigate the effect of confirmation bias. We evaluate our enhancements over multiple datasets showing performance gains over existing self-training design choices. Finally, we also study the extendability of our enhanced approach to Open Set unlabeled data (containing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Imbalanced Data Classification Techniques
