Semidefinite programming on population clustering: a global analysis
Shuheng Zhou

TL;DR
This paper provides a theoretical analysis of semidefinite programming and spectral methods for clustering small samples from two sub-Gaussian populations, especially when features are of low quality, establishing conditions for successful classification.
Contribution
It offers a theoretical foundation for the empirical success of spectral clustering and semidefinite relaxation in high-dimensional, low-quality feature settings for population mixture separation.
Findings
Successful classification when np=Ω(1/γ^2)
Semidefinite relaxation aligns with spectral method performance
Tradeoffs between sample size and features are characterized
Abstract
In this paper, we consider the problem of partitioning a small data sample of size drawn from a mixture of sub-gaussian distributions. Our work is motivated by the application of clustering individuals according to their population of origin using markers, when the divergence between the two populations is small. We are interested in the case that individual features are of low average quality , and we want to use as few of them as possible to correctly partition the sample. We consider semidefinite relaxation of an integer quadratic program which is formulated essentially as finding the maximum cut on a graph where edge weights in the cut represent dissimilarity scores between two nodes based on their features. A small simulation result in Blum, Coja-Oghlan, Frieze and Zhou (2007, 2009) shows that even when the sample size is small, by increasing so that $np=…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFacility Location and Emergency Management · Advanced Clustering Algorithms Research
