On semi-supervised learning
Alejandro Cholaquidis, Ricardo Fraiman, Mariela Sued

TL;DR
This paper investigates the conditions under which semi-supervised learning can effectively leverage large amounts of unlabeled data, proposing a new algorithm and analyzing its performance on real phoneme data.
Contribution
It introduces a new semi-supervised learning algorithm with theoretical guarantees under certain conditions and analyzes when semi-supervised methods are effective.
Findings
The algorithm attains asymptotic optimality with infinite unlabeled data.
Semi-supervised learning effectiveness depends heavily on initial training sample quality.
Performance varies significantly with different initial labeled datasets.
Abstract
Semi-supervised learning deals with the problem of how, if possible, to take advantage of a huge amount of unclassified data, to perform a classification in situations when, typically, there is little labeled data. Even though this is not always possible (it depends on how useful, for inferring the labels, it would be to know the distribution of the unlabeled data), several algorithm have been proposed recently. %but in general they are not proved to outperform A new algorithm is proposed, that under almost necessary conditions, %and it is proved that it attains asymptotically the performance of the best theoretical rule as the amount of unlabeled data tends to infinity. The set of necessary assumptions, although reasonable, show that semi-supervised classification only works for very well conditioned problems. The focus is on understanding when and why semi-supervised learning works…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
