A Distribution Testing Approach to Clustering Distributions
Gunjan Kumar, Yash Pote, Jonathan Scarlett

TL;DR
This paper introduces a distribution clustering method based on distribution testing, providing tight bounds on sample complexity for various scenarios involving known and unknown distributions.
Contribution
It establishes tight upper and lower bounds on sample complexity for distribution clustering with known and unknown distributions, depending on key parameters.
Findings
Sample complexity bounds depend on domain size, number of distributions, cluster size, and distance.
Achieves tight bounds up to a logarithmic factor for all parameter regimes.
Provides theoretical foundations for distribution clustering in different settings.
Abstract
We study the following distribution clustering problem: Given a hidden partition of distributions into two groups, such that the distributions within each group are the same, and the two distributions associated with the two clusters are -far in total variation, the goal is to recover the partition. We establish upper and lower bounds on the sample complexity for two fundamental cases: (1) when one of the cluster's distributions is known, and (2) when both are unknown. Our upper and lower bounds characterize the sample complexity's dependence on the domain size , number of distributions , size of one of the clusters, and distance . In particular, we achieve tightness with respect to (up to an factor) for all regimes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplexity and Algorithms in Graphs · Machine Learning and Algorithms · Facility Location and Emergency Management
