Source Identification for Mixtures of Product Distributions
Spencer L. Gordon, Bijan Mazaheri, Yuval Rabani, Leonard J. Schulman

TL;DR
This paper presents a new algorithm for identifying the original sources of a mixture of product distributions on binary variables, with explicit complexity bounds and improved efficiency over previous methods.
Contribution
It introduces the first explicit computational complexity bound for source identification in such mixtures, improving upon prior algorithms that only learned the mixture without source parameters.
Findings
Algorithm identifies source parameters from approximate moments.
Runs in $2^{O(k^2)} n^{O(k)}$ arithmetic operations.
Provides a quantitative analysis of source identifiability.
Abstract
We give an algorithm for source identification of a mixture of product distributions on bits. This is a fundamental problem in machine learning with many applications. Our algorithm identifies the source parameters of an identifiable mixture, given, as input, approximate values of multilinear moments (derived, for instance, from a sufficiently large sample), using arithmetic operations. Our result is the first explicit bound on the computational complexity of source identification of such mixtures. The running time improves previous results by Feldman, O'Donnell, and Servedio (FOCS 2005) and Chen and Moitra (STOC 2019) that guaranteed only learning the mixture (without parametric identification of the source). Our analysis gives a quantitative version of a qualitative characterization of identifiable sources that is due to Tahmasebi, Motahari, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Algorithms and Data Compression · Machine Learning and Data Classification
