Robust and sparse k-means clustering for high-dimensional data

Sarka Brodinova; Peter Filzmoser; Thomas Ortner; Christian; Breiteneder; Maia Zaharieva

arXiv:1709.10012·stat.ME·September 29, 2017·Adv. Data Anal. Classif.

Robust and sparse k-means clustering for high-dimensional data

Sarka Brodinova, Peter Filzmoser, Thomas Ortner, Christian, Breiteneder, Maia Zaharieva

PDF

TL;DR

This paper introduces a robust, sparse k-means clustering method that effectively identifies groups, outliers, and relevant variables in high-dimensional data without prior knowledge, using a weighted and penalized approach.

Contribution

It proposes a novel k-means-based algorithm with automatic observation weighting and a lasso-type penalty to handle noise and outliers in high-dimensional clustering.

Findings

01

Successfully identifies groups, outliers, and informative variables in simulated data.

02

Outperforms existing methods in real-world datasets.

03

Provides a framework for selecting the number of clusters and variables.

Abstract

In real-world application scenarios, the identification of groups poses a significant challenge due to possibly occurring outliers and existing noise variables. Therefore, there is a need for a clustering method which is capable of revealing the group structure in data containing both outliers and noise variables without any pre-knowledge. In this paper, we propose a $k$ -means-based algorithm incorporating a weighting function which leads to an automatic weight assignment for each observation. In order to cope with noise variables, a lasso-type penalty is used in an objective function adjusted by observation weights. We finally introduce a framework for selecting both the number of clusters and variables based on a modified gap statistic. The conducted experiments on simulated and real-world data demonstrate the advantage of the method to identify groups, outliers, and informative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.