CS Sparse K-means: An Algorithm for Cluster-Specific Feature Selection in High-Dimensional Clustering
Xiangrui Zeng, Hongyu Zheng

TL;DR
This paper introduces CS Sparse K-means, an advanced clustering algorithm that identifies cluster-specific features in high-dimensional data, improving interpretability and accuracy in applications like genomics.
Contribution
It proposes a novel EM-based clustering method with lasso constraints that detects features relevant to specific cluster pairs, addressing limitations of existing methods.
Findings
Effective on simulated data
Successful application to leukemia gene expression data
Identifies features specific to cluster pairs
Abstract
Feature selection is an important and challenging task in high dimensional clustering. For example, in genomics, there may only be a small number of genes that are differentially expressed, which are informative to the overall clustering structure. Existing feature selection methods, such as Sparse K-means, rarely tackle the problem of accounting features that can only separate a subset of clusters. In genomics, it is highly likely that a gene can only define one subtype against all the other subtypes or distinguish a pair of subtypes but not others. In this paper, we propose a K-means based clustering algorithm that discovers informative features as well as which cluster pairs are separable by each selected features. The method is essentially an EM algorithm, in which we introduce lasso-type constraints on each cluster pair in the M step, and make the E step possible by maximizing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Advanced Clustering Algorithms Research · Face and Expression Recognition
MethodsFeature Selection
