Mixture Proportion Estimation and PU Learning: A Modern Approach

Saurabh Garg; Yifan Wu; Alex Smola; Sivaraman Balakrishnan; Zachary C.; Lipton

arXiv:2111.00980·cs.LG·November 2, 2021·5 cites

Mixture Proportion Estimation and PU Learning: A Modern Approach

Saurabh Garg, Yifan Wu, Alex Smola, Sivaraman Balakrishnan, Zachary C., Lipton

PDF

Open Access 2 Repos

TL;DR

This paper introduces novel methods for mixture proportion estimation and PU learning that outperform previous approaches, with theoretical guarantees and practical improvements in high-dimensional settings.

Contribution

The paper proposes two simple, effective techniques—BBE for MPE and CVIR for PU learning—and combines them into TED$^n$, offering a modern, theoretically grounded approach.

Findings

01

Both methods outperform previous approaches empirically.

02

BBE has formal guarantees under certain conditions.

03

TED$^n$ significantly improves mixture estimation and classification.

Abstract

Given only positive examples and unlabeled examples (from both positive and negative classes), we might hope nevertheless to estimate an accurate positive-versus-negative classifier. Formally, this task is broken down into two subtasks: (i) Mixture Proportion Estimation (MPE) -- determining the fraction of positive examples in the unlabeled data; and (ii) PU-learning -- given such an estimate, learning the desired positive-versus-negative classifier. Unfortunately, classical methods for both problems break down in high-dimensional settings. Meanwhile, recently proposed heuristics lack theoretical coherence and depend precariously on hyperparameter tuning. In this paper, we propose two simple techniques: Best Bin Estimation (BBE) (for MPE); and Conditional Value Ignoring Risk (CVIR), a simple objective for PU-learning. Both methods dominate previous approaches empirically, and for BBE,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Imbalanced Data Classification Techniques