TL;DR
This paper introduces an unsupervised method to identify a sparse set of features that discriminate between clusters in high-dimensional data, enabling better understanding and classification of complex biological systems.
Contribution
The authors propose a novel ensemble-based approach to discover low-dimensional discriminative features in high-dimensional data without supervision.
Findings
Identified 27 key transcription factors in mouse gastrulation data.
Revealed clear cell type signatures in a low-dimensional subspace.
Outperformed prior methods in feature selection for clustering.
Abstract
Extracting an understanding of the underlying system from high dimensional data is a growing problem in science. Discovering informative and meaningful features is crucial for clustering, classification, and low dimensional data embedding. Here we propose to construct features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low dimensional space. We propose an unsupervised method to identify the subset of features that define a low dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
