Learning From Labeled And Unlabeled Data: An Empirical Study Across   Techniques And Domains

N. V. Chawla; Grigoris Karakoulas

arXiv:1109.2047·cs.LG·September 12, 2011

Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains

N. V. Chawla, Grigoris Karakoulas

PDF

TL;DR

This paper empirically evaluates semi-supervised learning techniques across diverse datasets, examining factors like feature relevance, data size, noise, and bias correction, to understand their effectiveness and limitations.

Contribution

It provides a comprehensive empirical analysis of semi-supervised learning methods across multiple domains, including bias correction strategies and the impact of various data conditions.

Findings

01

Bias correction improves semi-supervised learning performance.

02

Feature relevance and data size significantly affect results.

03

Sample-selection bias can lead to poor model accuracy.

Abstract

There has been increased interest in devising learning techniques that combine unlabeled data with labeled data ? i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different types and amounts of labeled and unlabeled data. Moreover, most of the published work on semi-supervised learning techniques assumes that the labeled and unlabeled data come from the same distribution. It is possible for the labeling process to be associated with a selection bias such that the distributions of data points in the labeled and unlabeled sets are different. Not correcting for such bias can result in biased function approximation with potentially poor performance. In this paper, we present an empirical study of various semi-supervised learning techniques on a variety of datasets. We attempt to answer various questions such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.