Semidefinite programming on population clustering: a global analysis

Shuheng Zhou

arXiv:2301.00344·math.ST·January 5, 2023

Semidefinite programming on population clustering: a global analysis

Shuheng Zhou

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of semidefinite programming and spectral methods for clustering small samples from two sub-Gaussian populations, especially when features are of low quality, establishing conditions for successful classification.

Contribution

It offers a theoretical foundation for the empirical success of spectral clustering and semidefinite relaxation in high-dimensional, low-quality feature settings for population mixture separation.

Findings

01

Successful classification when np=Ω(1/γ^2)

02

Semidefinite relaxation aligns with spectral method performance

03

Tradeoffs between sample size and features are characterized

Abstract

In this paper, we consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of $2$ sub-gaussian distributions. Our work is motivated by the application of clustering individuals according to their population of origin using markers, when the divergence between the two populations is small. We are interested in the case that individual features are of low average quality $γ$ , and we want to use as few of them as possible to correctly partition the sample. We consider semidefinite relaxation of an integer quadratic program which is formulated essentially as finding the maximum cut on a graph where edge weights in the cut represent dissimilarity scores between two nodes based on their features. A small simulation result in Blum, Coja-Oghlan, Frieze and Zhou (2007, 2009) shows that even when the sample size $n$ is small, by increasing $p$ so that $np=…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFacility Location and Emergency Management · Advanced Clustering Algorithms Research