Learning Arbitrary Statistical Mixtures of Discrete Distributions
Jian Li, Yuval Rabani, Leonard J. Schulman, Chaitanya Swamy

TL;DR
This paper introduces efficient algorithms for learning complex mixture models of discrete distributions from noisy, unlabeled samples, with applications in unsupervised learning such as topic modeling and collaborative filtering.
Contribution
It provides the first efficient, assumption-free algorithms for learning arbitrary mixture models of discrete distributions from noisy samples.
Findings
Algorithms achieve high accuracy in transportation distance.
Bounded solution quality based on sample size and noise level.
Applicable to various unsupervised learning tasks.
Abstract
We study the problem of learning from unlabeled samples very general statistical mixture models on large finite sets. Specifically, the model to be learned, , is a probability distribution over probability distributions , where each such is a probability distribution over . When we sample from , we do not observe directly, but only indirectly and in very noisy fashion, by sampling from repeatedly, independently times from the distribution . The problem is to infer to high accuracy in transportation (earthmover) distance. We give the first efficient algorithms for learning this mixture model without making any restricting assumptions on the structure of the distribution . We bound the quality of the solution as a function of the size of the samples and the number of samples used. Our model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Machine Learning and Algorithms · Text and Document Classification Technologies
