Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous   Data

Yubo Zhuang; Xiaohui Chen; Yun Yang

arXiv:2209.15097·stat.ML·May 30, 2023·1 cites

Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous Data

Yubo Zhuang, Xiaohui Chen, Yun Yang

PDF

Open Access 1 Video

TL;DR

This paper introduces iLA-SDP, a novel likelihood-adjusted semidefinite programming method for clustering heterogeneous data, which improves stability and accuracy over traditional methods like EM and K-means.

Contribution

The paper extends SDP-based clustering to heterogeneous data by integrating cluster labels as parameters, avoiding centroid estimation, and enhancing stability and exact recovery.

Findings

01

iLA-SDP achieves lower mis-clustering errors than existing methods.

02

iLA-SDP is less sensitive to initialization and more stable in high-dimensional settings.

03

Numeric experiments validate the superior performance of iLA-SDP.

Abstract

Clustering is a widely deployed unsupervised learning tool. Model-based clustering is a flexible framework to tackle data heterogeneity when the clusters have different shapes. Likelihood-based inference for mixture distributions often involves non-convex and high-dimensional objective functions, imposing difficult computational and statistical challenges. The classic expectation-maximization (EM) algorithm is a computationally thrifty iterative method that maximizes a surrogate function minorizing the log-likelihood of observed data in each iteration, which however suffers from bad local maxima even in the special case of the standard Gaussian mixture model with common isotropic covariance matrices. On the other hand, recent studies reveal that the unique global solution of a semidefinite programming (SDP) relaxed $K$ -means achieves the information-theoretically sharp threshold for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous Data· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Statistical Methods and Inference