# Crafted experiments to evaluate feature selection methods for single-cell RNA-seq data

**Authors:** Siyao Liu, David L Corcoran, Susana Garcia-Recio, James S Marron, Charles M Perou

PMC · DOI: 10.1093/nargab/lqaf023 · NAR Genomics and Bioinformatics · 2025-03-19

## TL;DR

This paper introduces a new approach for evaluating gene selection methods in single-cell RNA-seq data using crafted experiments and a new method called GOF.

## Contribution

The paper introduces a novel framework for benchmarking gene selection methods using crafted experiments and a new univariate distribution-oriented method called GOF.

## Key findings

- Crafted experiments effectively evaluate feature selection methods by perturbing real datasets.
- GOF performs well on both crafted and real datasets, selecting features that robustly identify biological signals.
- Different GOF methods perform best in specific contexts, as shown through varied crafting approaches.

## Abstract

While numerous methods have been developed for analyzing scRNA-seq data, benchmarking various methods remains challenging. There is a lack of ground truth datasets for evaluating novel gene selection and/or clustering methods. We propose the use of crafted experiments, a new approach based upon perturbing signals in a real dataset for comparing analysis methods. We demonstrate the effectiveness of crafted experiments for evaluating new univariate distribution-oriented suite of feature selection methods, called GOF. We show GOF selects features that robustly identify crafted features and perform well on real non-crafted data sets. Using varying ways of crafting, we also show the context in which each GOF method performs the best. GOF is implemented as an open-source R package and freely available under GPL-2 license at https://github.com/siyao-liu/GOF. Source code, including all functions for constructing crafted experiments and benchmarking feature selection methods, are publicly available at https://github.com/siyao-liu/CraftedExperiment.

## Full-text entities

- **Genes:** CD14 (CD14 molecule) [NCBI Gene 929], NCAM1 (neural cell adhesion molecule 1) [NCBI Gene 4684] {aka CD56, MSK39, NCAM}, CD4 (CD4 molecule) [NCBI Gene 920] {aka CD4mut, IMD79, Leu-3, OKT4D, T4}
- **Diseases:** Breast Cancer (MESH:D001943), GOF (MESH:D012640), T (MESH:D001260), UMAP (MESH:C567162)
- **Chemicals:** Luminal (MESH:D010634)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** SUM149 — Homo sapiens (Human), Breast inflammatory carcinoma, Cancer cell line (CVCL_3422), dermal fibroblasts — Mus musculus (Mouse), Spontaneously immortalized cell line (CVCL_U509), FVB3 — Mus musculus (Mouse), Embryonic stem cell (CVCL_F046), MCF7 — Homo sapiens (Human), Invasive breast carcinoma of no special type, Cancer cell line (CVCL_0031)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11920870/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11920870/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC11920870/full.md

---
Source: https://tomesphere.com/paper/PMC11920870