The Peaking Phenomenon in Semi-supervised Learning

Jesse H. Krijthe; Marco Loog

arXiv:1610.05160·stat.ML·October 18, 2016

The Peaking Phenomenon in Semi-supervised Learning

Jesse H. Krijthe, Marco Loog

PDF

Open Access

TL;DR

This paper investigates the peaking phenomenon in semi-supervised learning, showing that adding unlabeled data can initially worsen performance before improving it, with explanations based on simulations and learning curve approximations.

Contribution

It extends the understanding of the peaking phenomenon from supervised to semi-supervised learning, highlighting its more pronounced effects and providing theoretical insights.

Findings

01

Peaking occurs more strongly in semi-supervised learning.

02

Adding unlabeled data can initially increase error rates.

03

The learning curve behavior is explained via simulations and approximations.

Abstract

For the supervised least squares classifier, when the number of training objects is smaller than the dimensionality of the data, adding more data to the training set may first increase the error rate before decreasing it. This, possibly counterintuitive, phenomenon is known as peaking. In this work, we observe that a similar but more pronounced version of this phenomenon also occurs in the semi-supervised setting, where instead of labeled objects, unlabeled objects are added to the training set. We explain why the learning curve has a more steep incline and a more gradual decline in this setting through simulation studies and by applying an approximation of the learning curve based on the work by Raudys & Duin.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Neural Networks and Applications · Face and Expression Recognition