Learning Arbitrary Statistical Mixtures of Discrete Distributions

Jian Li; Yuval Rabani; Leonard J. Schulman; Chaitanya Swamy

arXiv:1504.02526·cs.LG·April 13, 2015·2 cites

Learning Arbitrary Statistical Mixtures of Discrete Distributions

Jian Li, Yuval Rabani, Leonard J. Schulman, Chaitanya Swamy

PDF

Open Access

TL;DR

This paper introduces efficient algorithms for learning complex mixture models of discrete distributions from noisy, unlabeled samples, with applications in unsupervised learning such as topic modeling and collaborative filtering.

Contribution

It provides the first efficient, assumption-free algorithms for learning arbitrary mixture models of discrete distributions from noisy samples.

Findings

01

Algorithms achieve high accuracy in transportation distance.

02

Bounded solution quality based on sample size and noise level.

03

Applicable to various unsupervised learning tasks.

Abstract

We study the problem of learning from unlabeled samples very general statistical mixture models on large finite sets. Specifically, the model to be learned, $ϑ$ , is a probability distribution over probability distributions $p$ , where each such $p$ is a probability distribution over $[n] = {1, 2, \dots, n}$ . When we sample from $ϑ$ , we do not observe $p$ directly, but only indirectly and in very noisy fashion, by sampling from $[n]$ repeatedly, independently $K$ times from the distribution $p$ . The problem is to infer $ϑ$ to high accuracy in transportation (earthmover) distance. We give the first efficient algorithms for learning this mixture model without making any restricting assumptions on the structure of the distribution $ϑ$ . We bound the quality of the solution as a function of the size of the samples $K$ and the number of samples used. Our model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Machine Learning and Algorithms · Text and Document Classification Technologies