Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous Data
Yubo Zhuang, Xiaohui Chen, Yun Yang

TL;DR
This paper introduces iLA-SDP, a novel likelihood-adjusted semidefinite programming method for clustering heterogeneous data, which improves stability and accuracy over traditional methods like EM and K-means.
Contribution
The paper extends SDP-based clustering to heterogeneous data by integrating cluster labels as parameters, avoiding centroid estimation, and enhancing stability and exact recovery.
Findings
iLA-SDP achieves lower mis-clustering errors than existing methods.
iLA-SDP is less sensitive to initialization and more stable in high-dimensional settings.
Numeric experiments validate the superior performance of iLA-SDP.
Abstract
Clustering is a widely deployed unsupervised learning tool. Model-based clustering is a flexible framework to tackle data heterogeneity when the clusters have different shapes. Likelihood-based inference for mixture distributions often involves non-convex and high-dimensional objective functions, imposing difficult computational and statistical challenges. The classic expectation-maximization (EM) algorithm is a computationally thrifty iterative method that maximizes a surrogate function minorizing the log-likelihood of observed data in each iteration, which however suffers from bad local maxima even in the special case of the standard Gaussian mixture model with common isotropic covariance matrices. On the other hand, recent studies reveal that the unique global solution of a semidefinite programming (SDP) relaxed -means achieves the information-theoretically sharp threshold for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Statistical Methods and Inference
