Semi-supervised K-means++

Jordan Yoder; Carey E. Priebe

arXiv:1602.00360·stat.ML·February 2, 2016·1 cites

Semi-supervised K-means++

Jordan Yoder, Carey E. Priebe

PDF

Open Access

TL;DR

This paper extends the k-means++ initialization method to semi-supervised clustering, providing improved theoretical bounds and demonstrating enhanced performance on simulated and real datasets.

Contribution

It introduces a semi-supervised version of k-means++ with new analysis, improving theoretical guarantees and practical clustering results.

Findings

01

Enhanced theoretical bounds on clustering cost.

02

Improved clustering performance on real datasets.

03

Roughly linear semi-supervised clustering algorithm.

Abstract

Traditionally, practitioners initialize the {\tt k-means} algorithm with centers chosen uniformly at random. Randomized initialization with uneven weights ({\tt k-means++}) has recently been used to improve the performance over this strategy in cost and run-time. We consider the k-means problem with semi-supervised information, where some of the data are pre-labeled, and we seek to label the rest according to the minimum cost solution. By extending the {\tt k-means++} algorithm and analysis to account for the labels, we derive an improved theoretical bound on expected cost and observe improved performance in simulated and real data examples. This analysis provides theoretical justification for a roughly linear semi-supervised clustering algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Clustering Algorithms Research · Face and Expression Recognition · Bayesian Methods and Mixture Models