Sketch-and-Lift: Scalable Subsampled Semidefinite Program for $K$-means Clustering
Yubo Zhuang, Xiaohui Chen, Yun Yang

TL;DR
This paper introduces a scalable, linear-time algorithm for approximate $K$-means clustering using a subsampled SDP approach, achieving high accuracy and similar recovery thresholds as full SDP methods.
Contribution
The paper presents the sketch-and-lift (SL) method, a novel scalable approach that approximates SDP relaxation for $K$-means clustering with theoretical guarantees and practical efficiency.
Findings
SL achieves similar exact recovery thresholds as full SDP.
SL outperforms state-of-the-art fast clustering algorithms in accuracy.
SL significantly reduces runtime while maintaining high statistical accuracy.
Abstract
Semidefinite programming (SDP) is a powerful tool for tackling a wide range of computationally hard problems such as clustering. Despite the high accuracy, semidefinite programs are often too slow in practice with poor scalability on large (or even moderate) datasets. In this paper, we introduce a linear time complexity algorithm for approximating an SDP relaxed -means clustering. The proposed sketch-and-lift (SL) approach solves an SDP on a subsampled dataset and then propagates the solution to all data points by a nearest-centroid rounding procedure. It is shown that the SL approach enjoys a similar exact recovery threshold as the -means SDP on the full dataset, which is known to be information-theoretically tight under the Gaussian mixture model. The SL method can be made adaptive with enhanced theoretic properties when the cluster sizes are unbalanced. Our simulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Face and Expression Recognition · Advanced Image and Video Retrieval Techniques
