On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data
Nan Lu, Gang Niu, Aditya Krishna Menon, and Masashi Sugiyama

TL;DR
This paper investigates the minimal supervision needed to train any binary classifier using only unlabeled data, proving the necessity of two datasets with different class priors and proposing a consistent ERM-based method that outperforms existing approaches.
Contribution
It establishes the fundamental limits of training binary classifiers from unlabeled data and introduces a novel ERM-based method using two datasets with different class priors.
Findings
Training from one unlabeled dataset is impossible to unbiasedly estimate risk.
Using two unlabeled datasets with different class priors enables unbiased risk estimation.
The proposed method outperforms state-of-the-art techniques in experiments.
Abstract
Empirical risk minimization (ERM), with proper loss function and regularization, is the common practice of supervised classification. In this paper, we study training arbitrary (from linear to deep) binary classifier from only unlabeled (U) data by ERM. We prove that it is impossible to estimate the risk of an arbitrary binary classifier in an unbiased manner given a single set of U data, but it becomes possible given two sets of U data with different class priors. These two facts answer a fundamental question---what the minimal supervision is for training any binary classifier from only U data. Following these findings, we propose an ERM-based learning method from two sets of U data, and then prove it is consistent. Experiments demonstrate the proposed method could train deep models and outperform state-of-the-art methods for learning from two sets of U data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
