Deterministic Feature Selection for $k$-means Clustering

Christos Boutsidis; Malik Magdon-Ismail

arXiv:1109.5664·cs.LG·November 17, 2016

Deterministic Feature Selection for $k$-means Clustering

Christos Boutsidis, Malik Magdon-Ismail

PDF

TL;DR

This paper introduces a deterministic feature selection algorithm for k-means clustering that offers provable theoretical guarantees, addressing the limitations of existing randomized methods.

Contribution

The paper presents the first deterministic algorithm for feature selection in k-means with proven theoretical performance guarantees.

Findings

01

The algorithm guarantees successful feature selection with high probability.

02

It improves reliability over randomized methods.

03

The approach is based on deterministic identity decomposition.

Abstract

We study feature selection for $k$ -means clustering. Although the literature contains many methods with good empirical performance, algorithms with provable theoretical behavior have only recently been developed. Unfortunately, these algorithms are randomized and fail with, say, a constant probability. We address this issue by presenting a deterministic feature selection algorithm for k-means with theoretical guarantees. At the heart of our algorithm lies a deterministic method for decompositions of the identity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.