Active feature selection discovers minimal gene sets for classifying cell types and disease states with single-cell mRNA-seq data
Xiaoqiao Chen, Sisi Chen, Matt Thomson

TL;DR
This paper introduces ActiveSVM, an active learning method that identifies minimal, highly-informative gene sets for classifying cell types and disease states in single-cell mRNA-seq data, reducing costs and enabling clinical applications.
Contribution
The paper presents a novel active feature selection approach, ActiveSVM, that efficiently finds small gene sets for accurate cell classification and biological insights in large single-cell datasets.
Findings
Achieves ~90% cell-type classification accuracy
Scales to datasets with over a million cells
Generalizes to genetic perturbation and spatial transcriptomics
Abstract
Sequencing costs currently prohibit the application of single-cell mRNA-seq to many biological and clinical analyses. Targeted single-cell mRNA-sequencing reduces sequencing costs by profiling reduced gene sets that capture biological information with a minimal number of genes. Here, we introduce an active learning method (ActiveSVM) that identifies minimal but highly-informative gene sets that enable the identification of cell-types, physiological states, and genetic perturbations in single-cell data using a small number of genes. Our active feature selection procedure generates minimal gene sets from single-cell data through an iterative cell-type classification task where misclassified cells are examined at each round of analysis to identify maximally informative genes through an `active' support vector machine (ActiveSVM) classifier. By focusing computational resources on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Gene expression and cancer classification · Machine Learning and Algorithms
MethodsFeature Selection · Support Vector Machine
