CS Sparse K-means: An Algorithm for Cluster-Specific Feature Selection   in High-Dimensional Clustering

Xiangrui Zeng; Hongyu Zheng

arXiv:1909.12384·stat.ME·October 7, 2019·1 cites

CS Sparse K-means: An Algorithm for Cluster-Specific Feature Selection in High-Dimensional Clustering

Xiangrui Zeng, Hongyu Zheng

PDF

Open Access

TL;DR

This paper introduces CS Sparse K-means, an advanced clustering algorithm that identifies cluster-specific features in high-dimensional data, improving interpretability and accuracy in applications like genomics.

Contribution

It proposes a novel EM-based clustering method with lasso constraints that detects features relevant to specific cluster pairs, addressing limitations of existing methods.

Findings

01

Effective on simulated data

02

Successful application to leukemia gene expression data

03

Identifies features specific to cluster pairs

Abstract

Feature selection is an important and challenging task in high dimensional clustering. For example, in genomics, there may only be a small number of genes that are differentially expressed, which are informative to the overall clustering structure. Existing feature selection methods, such as Sparse K-means, rarely tackle the problem of accounting features that can only separate a subset of clusters. In genomics, it is highly likely that a gene can only define one subtype against all the other subtypes or distinguish a pair of subtypes but not others. In this paper, we propose a K-means based clustering algorithm that discovers informative features as well as which cluster pairs are separable by each selected features. The method is essentially an EM algorithm, in which we introduce lasso-type constraints on each cluster pair in the M step, and make the E step possible by maximizing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Advanced Clustering Algorithms Research · Face and Expression Recognition

MethodsFeature Selection