# Integrating feature selection with unsupervised deep embedding for clustering single-cell RNA-seq data

**Authors:** Cheng Zhong, Siqi Jiang, Zhi Wei

PMC · DOI: 10.1093/bib/bbag082 · Briefings in Bioinformatics · 2026-03-02

## TL;DR

This paper introduces a new method for clustering single-cell RNA-seq data by combining feature selection and clustering in a unified framework.

## Contribution

The novel contribution is a unified framework, FSSC, that jointly performs feature selection and clustering for scRNA-seq data.

## Key findings

- FSSC outperforms existing methods in clustering accuracy on both simulated and real datasets.
- The method identifies biologically meaningful marker genes in a compact and informative set.

## Abstract

Single-cell RNA sequencing (scRNA-seq) enables high-resolution analysis of gene expression at the individual cell level, with clustering serving as a critical step for identifying distinct cell populations. Due to the high dimensionality and sparsity of scRNA-seq data, existing approaches typically perform gene selection prior to clustering. However, treating feature selection as a separate preprocessing step can overlook latent clustering structure and often results in suboptimal outcomes, as it does not guarantee that the selected genes are informative for clustering. To address this limitation, we propose FSSC (Feature Selection for scRNA-seq Clustering), a unified framework for joint feature selection and clustering in scRNA-seq analysis. FSSC integrates a zero-inflated negative binomial (ZINB) autoencoder with a group Lasso penalty and a dedicated clustering loss. This joint optimization enables the model to simultaneously learn low-dimensional representations and select a compact set of cluster-discriminatory genes, preserving both the statistical characteristics of scRNA-seq data and its underlying cluster structure. Extensive experiments on both simulated and real scRNA-seq datasets demonstrate that FSSC consistently outperforms state-of-the-art methods in clustering accuracy and effectively identifies a compact, biologically meaningful set of marker genes.

## Full-text entities

- **Genes:** NAPSA (napsin A aspartic peptidase) [NCBI Gene 9476] {aka KAP, Kdap, NAP1, NAPA, NR1H2-AS1, SNAPA}, AQP4 (aquaporin 4) [NCBI Gene 361] {aka MIWC, MLC4, WCH4, hAQP4}, CXCL12 (C-X-C motif chemokine ligand 12) [NCBI Gene 6387] {aka IRH, PBSF, SCYB12, SDF1, TLSF, TPAR1}, OSR2 (odd-skipped related transciption factor 2) [NCBI Gene 116039], CLEC4F (C-type lectin domain family 4 member F) [NCBI Gene 165530] {aka CLECSF13, KCLR, KCR}, TTC36 (tetratricopeptide repeat domain 36) [NCBI Gene 143941] {aka HBP21}, CLEC4D (C-type lectin domain family 4 member D) [NCBI Gene 338339] {aka CD368, CLEC-6, CLEC6, CLECSF8, Dectin-3, MCL}, BMP5 (bone morphogenetic protein 5) [NCBI Gene 653], BMP4 (bone morphogenetic protein 4) [NCBI Gene 652] {aka BMP2B, BMP2B1, MCOPS6, OFC11, ZYME}, WNT2 (Wnt family member 2) [NCBI Gene 7472] {aka INT1L1, IRP}, RBP1 (retinol binding protein 1) [NCBI Gene 5947] {aka CRABP-I, CRBP, CRBP1, CRBPI, RBPC, hCRBP1}, JAG1 (jagged canonical Notch ligand 1) [NCBI Gene 182] {aka AGS, AGS1, AHD, AWS, CD339, CMT2HH}, LHX1 (LIM homeobox 1) [NCBI Gene 3975] {aka LIM-1, LIM1}
- **Diseases:** FSSC (MESH:D012640)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12951082/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12951082/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC12951082/full.md

---
Source: https://tomesphere.com/paper/PMC12951082