Learning Balanced Mixtures of Discrete Distributions with Small Sample
Shuheng Zhou

TL;DR
This paper introduces a graph-based method for partitioning small samples from mixtures of discrete distributions by leveraging high-dimensional features, enabling effective clustering with limited data.
Contribution
It presents a novel approach using maximum-weight balanced cuts on graphs derived from high-dimensional data to accurately partition mixture distributions with small samples.
Findings
High-dimensional features enable clustering with small samples.
The method guarantees correct partitioning under specified conditions.
Trade-offs between feature dimension and sample size are demonstrated.
Abstract
We study the problem of partitioning a small sample of individuals from a mixture of product distributions over a Boolean cube according to their distributions. Each distribution is described by a vector of allele frequencies in . Given two distributions, we use to denote the average distance in frequencies across dimensions, which measures the statistical divergence between them. We study the case assuming that bits are independently distributed across dimensions. This work demonstrates that, for a balanced input instance for , a certain graph-based optimization function returns the correct partition with high probability, where a weighted graph is formed over individuals, whose pairwise hamming distances between their corresponding bit vectors define the edge weights, so long as and $Kn =…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Machine Learning and Algorithms · Algorithms and Data Compression
