Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data
Tomoya Sakai, Marthinus Christoffel du Plessis, Gang Niu, Masashi, Sugiyama

TL;DR
This paper introduces a novel semi-supervised classification method that leverages positive, negative, and unlabeled data, providing theoretical guarantees and demonstrating improved performance without relying on traditional distributional assumptions.
Contribution
It extends positive-unlabeled classification to include negative data and establishes generalization bounds that improve with more unlabeled data, without requiring distributional assumptions.
Findings
Generalization bounds decrease with more unlabeled data
Proposed methods outperform existing semi-supervised classifiers
The approach is effective in practical experiments
Abstract
Most of the semi-supervised classification methods developed so far use unlabeled data for regularization purposes under particular distributional assumptions such as the cluster assumption. In contrast, recently developed methods of classification from positive and unlabeled data (PU classification) use unlabeled data for risk evaluation, i.e., label information is directly extracted from unlabeled data. In this paper, we extend PU classification to also incorporate negative data and propose a novel semi-supervised classification approach. We establish generalization error bounds for our novel methods and show that the bounds decrease with respect to the number of unlabeled data without the distributional assumptions that are required in existing semi-supervised classification methods. Through experiments, we demonstrate the usefulness of the proposed methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
