Identification of Mixtures of Discrete Product Distributions in   Near-Optimal Sample and Time Complexity

Spencer L. Gordon; Erik Jahn; Bijan Mazaheri; Yuval Rabani; Leonard J.; Schulman

arXiv:2309.13993·cs.LG·September 26, 2023

Identification of Mixtures of Discrete Product Distributions in Near-Optimal Sample and Time Complexity

Spencer L. Gordon, Erik Jahn, Bijan Mazaheri, Yuval Rabani, Leonard J., Schulman

PDF

Open Access

TL;DR

This paper presents a near-optimal algorithm for identifying mixtures of discrete product distributions with improved sample and time complexity, matching lower bounds across various separation parameters.

Contribution

It introduces a method achieving $(1/\zeta)^{O(k)}$ complexity for any $n \geq 2k-1$, combining tensor decomposition and novel matrix condition number bounds.

Findings

01

Achieves sample and runtime complexity $(1/\zeta)^{O(k)}$

02

Matches lower bounds for a broad range of separation parameters

03

Extends known lower bounds to align with upper bounds

Abstract

We consider the problem of identifying, from statistics, a distribution of discrete random variables $X_{1}, \dots, X_{n}$ that is a mixture of $k$ product distributions. The best previous sample complexity for $n \in O (k)$ was $(1/ ζ)^{O (k^{2} l o g k)}$ (under a mild separation assumption parameterized by $ζ$ ). The best known lower bound was $exp (Ω (k))$ . It is known that $n \geq 2 k - 1$ is necessary and sufficient for identification. We show, for any $n \geq 2 k - 1$ , how to achieve sample complexity and run-time complexity $(1/ ζ)^{O (k)}$ . We also extend the known lower bound of $e^{Ω (k)}$ to match our upper bound across a broad range of $ζ$ . Our results are obtained by combining (a) a classic method for robust tensor decomposition, (b) a novel way of bounding the condition number of key matrices called Hadamard extensions, by studying their action only on flattened rank-1…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTensor decomposition and applications · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms