On the Minimal Supervision for Training Any Binary Classifier from Only   Unlabeled Data

Nan Lu; Gang Niu; Aditya Krishna Menon; and Masashi Sugiyama

arXiv:1808.10585·stat.ML·March 13, 2019·23 cites

On the Minimal Supervision for Training Any Binary Classifier from Only Unlabeled Data

Nan Lu, Gang Niu, Aditya Krishna Menon, and Masashi Sugiyama

PDF

Open Access 1 Repo

TL;DR

This paper investigates the minimal supervision needed to train any binary classifier using only unlabeled data, proving the necessity of two datasets with different class priors and proposing a consistent ERM-based method that outperforms existing approaches.

Contribution

It establishes the fundamental limits of training binary classifiers from unlabeled data and introduces a novel ERM-based method using two datasets with different class priors.

Findings

01

Training from one unlabeled dataset is impossible to unbiasedly estimate risk.

02

Using two unlabeled datasets with different class priors enables unbiased risk estimation.

03

The proposed method outperforms state-of-the-art techniques in experiments.

Abstract

Empirical risk minimization (ERM), with proper loss function and regularization, is the common practice of supervised classification. In this paper, we study training arbitrary (from linear to deep) binary classifier from only unlabeled (U) data by ERM. We prove that it is impossible to estimate the risk of an arbitrary binary classifier in an unbiased manner given a single set of U data, but it becomes possible given two sets of U data with different class priors. These two facts answer a fundamental question---what the minimal supervision is for training any binary classifier from only U data. Following these findings, we propose an ERM-based learning method from two sets of U data, and then prove it is consistent. Experiments demonstrate the proposed method could train deep models and outperform state-of-the-art methods for learning from two sets of U data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lunanbit/UUlearning
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning