A Distribution Testing Approach to Clustering Distributions

Gunjan Kumar; Yash Pote; Jonathan Scarlett

arXiv:2512.08376·cs.DS·December 10, 2025

A Distribution Testing Approach to Clustering Distributions

Gunjan Kumar, Yash Pote, Jonathan Scarlett

PDF

Open Access

TL;DR

This paper introduces a distribution clustering method based on distribution testing, providing tight bounds on sample complexity for various scenarios involving known and unknown distributions.

Contribution

It establishes tight upper and lower bounds on sample complexity for distribution clustering with known and unknown distributions, depending on key parameters.

Findings

01

Sample complexity bounds depend on domain size, number of distributions, cluster size, and distance.

02

Achieves tight bounds up to a logarithmic factor for all parameter regimes.

03

Provides theoretical foundations for distribution clustering in different settings.

Abstract

We study the following distribution clustering problem: Given a hidden partition of $k$ distributions into two groups, such that the distributions within each group are the same, and the two distributions associated with the two clusters are $ε$ -far in total variation, the goal is to recover the partition. We establish upper and lower bounds on the sample complexity for two fundamental cases: (1) when one of the cluster's distributions is known, and (2) when both are unknown. Our upper and lower bounds characterize the sample complexity's dependence on the domain size $n$ , number of distributions $k$ , size $r$ of one of the clusters, and distance $ε$ . In particular, we achieve tightness with respect to $(n, k, r, ε)$ (up to an $O (lo g k)$ factor) for all regimes.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComplexity and Algorithms in Graphs · Machine Learning and Algorithms · Facility Location and Emergency Management