Mixture Proportion Estimation via Kernel Embedding of Distributions

Harish G. Ramaswamy; Clayton Scott; Ambuj Tewari

arXiv:1603.02501·cs.LG·June 1, 2016·41 cites

Mixture Proportion Estimation via Kernel Embedding of Distributions

Harish G. Ramaswamy, Clayton Scott, Ambuj Tewari

PDF

Open Access

TL;DR

This paper introduces a provably correct and efficient kernel embedding-based algorithm for mixture proportion estimation, with proven convergence rates, applicable to various weakly supervised learning tasks.

Contribution

It presents the first efficient algorithm with convergence guarantees for mixture proportion estimation using RKHS embeddings.

Findings

01

Algorithm performs comparably or better than existing methods on standard datasets.

02

Provides convergence rates under certain distribution assumptions.

03

Uses simple convex quadratic programming for implementation.

Abstract

Mixture proportion estimation (MPE) is the problem of estimating the weight of a component distribution in a mixture, given samples from the mixture and component. This problem constitutes a key part in many "weakly supervised learning" problems like learning with positive and unlabelled samples, learning with label noise, anomaly detection and crowdsourcing. While there have been several methods proposed to solve this problem, to the best of our knowledge no efficient algorithm with a proven convergence rate towards the true proportion exists for this problem. We fill this gap by constructing a provably correct algorithm for MPE, and derive convergence rates under certain assumptions on the distribution. Our method is based on embedding distributions onto an RKHS, and implementing it only requires solving a simple convex quadratic programming problem a few times. We run our algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Gaussian Processes and Bayesian Inference