# On feature selection to disentangle cell type and state transcriptional programs

**Authors:** Jiayi Wang, Helena L. Crowell, Mark D. Robinson

PMC · DOI: 10.1186/s12864-025-12085-9 · 2025-11-06

## TL;DR

This paper explores how to better separate cell type and state features in single-cell data to improve analysis of cellular differences.

## Contribution

A novel feature selection approach that disentangles cell type and state transcriptional programs is proposed and evaluated.

## Key findings

- Decoupling cell type and state features improves embedding spaces for differential testing.
- Type-focused embeddings yield more comparable results between clustering and neighborhood-based methods.
- Simulation and experimental datasets validate the effectiveness of the proposed feature selection strategies.

## Abstract

Single-cell omics approaches profile molecular constituents of individual cells. Replicated multi-condition experiments in particular aim at studying how the molecular makeup and composition of cell subpopulations changes at the sample-level. Two main approaches have been proposed for these tasks: firstly, cluster-based methods that group cells into (non-overlapping) subpopulations based on their molecular profiles and, secondly, cluster-free but neighborhood-based methods that identify (overlapping) groups of cells in consideration of cross-condition changes. In either approach, discrete cell groups are subjected to differential testing across conditions; and, a low-dimensional cell embedding, which is in turn derived from a subset of selected features, is required to delineate subpopulations or neighborhoods. We hypothesized that decoupling differences in cell type (i.e., between subpopulations) and cell state (i.e., between conditions) for feature selection would yield an embedding space that captures different aspects of cellular heterogeneity. And, that type-not-state embeddings would arrive at differential testing results that are more comparable between cluster- and neighborhood-based differential testing approaches. Our study leverages a simulation framework with competing type and state effects, as well as an experimental dataset, to evaluate a set of feature scoring and selection strategies, and to compare results from downstream differential analyses.

The online version contains supplementary material available at 10.1186/s12864-025-12085-9.

## Full-text entities

- **Genes:** PKD2 (polycystin 2, transient receptor potential cation channel) [NCBI Gene 5311] {aka APKD2, PC2, PKD4, Pc-2, TRPP2}
- **Diseases:** Lupus (MESH:D008180), MDR (MESH:D018088), inflammation (MESH:D007249)
- **Chemicals:** miloDE (-)
- **Species:** Lemur (genus) [taxon 9446], Homo sapiens (human, species) [taxon 9606]

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12593937/full.md

---
Source: https://tomesphere.com/paper/PMC12593937