# A single cell RNAseq benchmark experiment embedding “controlled” cancer heterogeneity

**Authors:** Maddalena Arigoni, Maria Luisa Ratto, Federica Riccardo, Elisa Balmas, Lorenzo Calogero, Francesca Cordero, Marco Beccuti, Raffaele A. Calogero, Luca Alessandri

PMC · DOI: 10.1038/s41597-024-03002-y · Scientific Data · 2024-02-02

## TL;DR

This paper introduces a benchmark scRNA-seq dataset using lung cancer cell lines with known driver genes to help improve cancer heterogeneity analysis methods.

## Contribution

The novel contribution is a controlled scRNA-seq benchmark dataset with defined cancer heterogeneity for validating bioinformatics tools.

## Key findings

- The dataset includes lung cancer cell lines with seven driver genes (EGFR, ALK, MET, ERBB2, KRAS, BRAF, ROS1).
- It provides a framework for developing and testing methodologies to analyze cancer heterogeneity.
- The dataset supports cell annotation and tumour subpopulation identification in scRNA-seq data.

## Abstract

Single-cell RNA sequencing (scRNA-seq) has emerged as a vital tool in tumour research, enabling the exploration of molecular complexities at the individual cell level. It offers new technical possibilities for advancing tumour research with the potential to yield significant breakthroughs. However, deciphering meaningful insights from scRNA-seq data poses challenges, particularly in cell annotation and tumour subpopulation identification. Efficient algorithms are therefore needed to unravel the intricate biological processes of cancer. To address these challenges, benchmarking datasets are essential to validate bioinformatics methodologies for analysing single-cell omics in oncology. Here, we present a 10XGenomics scRNA-seq experiment, providing a controlled heterogeneous environment using lung cancer cell lines characterised by the expression of seven different driver genes (EGFR, ALK, MET, ERBB2, KRAS, BRAF, ROS1), leading to partially overlapping functional pathways. Our dataset provides a comprehensive framework for the development and validation of methodologies for analysing cancer heterogeneity by means of scRNA-seq.

## Linked entities

- **Genes:** EGFR (epidermal growth factor receptor) [NCBI Gene 1956], ALK (ALK receptor tyrosine kinase) [NCBI Gene 238], MET (MET proto-oncogene, receptor tyrosine kinase) [NCBI Gene 4233], ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064], KRAS (KRAS proto-oncogene, GTPase) [NCBI Gene 3845], BRAF (B-Raf proto-oncogene, serine/threonine kinase) [NCBI Gene 673], ROS1 (ROS proto-oncogene 1, receptor tyrosine kinase) [NCBI Gene 6098]
- **Diseases:** lung cancer (MONDO:0005138)

## Full-text entities

- **Genes:** GEM (GTP binding protein overexpressed in skeletal muscle) [NCBI Gene 2669] {aka KIR}, ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}, EGFR (epidermal growth factor receptor) [NCBI Gene 1956] {aka ERBB, ERBB1, ERRP, HER1, NISBD2, NNCIS}, EML4 (EMAP like 4) [NCBI Gene 27436] {aka C2orf2, ELP120, EMAP-4, EMAPL4, ROPP120}, ALK (ALK receptor tyrosine kinase) [NCBI Gene 238] {aka ALK1, CD246, NBLST3}, SLC34A2 (solute carrier family 34 member 2) [NCBI Gene 10568] {aka NAPI-3B, NAPI-IIb, NPTIIb, NaPi2b, PULAM}, PTPN3 (protein tyrosine phosphatase non-receptor type 3) [NCBI Gene 5774] {aka PTP-H1, PTPH1}, KRAS (KRAS proto-oncogene, GTPase) [NCBI Gene 3845] {aka 'C-K-RAS, C-K-RAS, CFC2, K-RAS2A, K-RAS2B, K-RAS4A}, SLTM (SAFB like transcription modulator) [NCBI Gene 79811] {aka Met}, ROS1 (ROS proto-oncogene 1, receptor tyrosine kinase) [NCBI Gene 6098] {aka MCF3, ROS, c-ros-1}, BRAF (B-Raf proto-oncogene, serine/threonine kinase) [NCBI Gene 673] {aka B-RAF1, B-raf, BRAF-1, BRAF1, NS7, RAFB1}, MAP2K7 (mitogen-activated protein kinase kinase 7) [NCBI Gene 5609] {aka JNKK2, MAPKK7, MEK, MEK 7, MKK7, PRKMK7}
- **Diseases:** Cancer (MESH:D009369), NSCLC (MESH:D002289), lung cancer (MESH:D008175), Mycoplasma (MESH:D009175)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Mutations:** p.G469A, p.G12D, p.G12S, L858R, p.V842I, T790M
- **Cell lines:** H2228 — Homo sapiens (Human), Lung adenocarcinoma, Cancer cell line (CVCL_1543), PC9 — Homo sapiens (Human), Lung adenocarcinoma, Cancer cell line (CVCL_B260), NCI — Homo sapiens (Human), Gastric adenocarcinoma, Cancer cell line (CVCL_0078), C4619J — Homo sapiens (Human), Bladder carcinoma, Cancer cell line (CVCL_M891), CRL5868 — Sigmodon hispidus (Hispid cotton rat), Spontaneously immortalized cell line (CVCL_YD58), NCI-H596 — Homo sapiens (Human), Lung adenosquamous carcinoma, Cancer cell line (CVCL_1571), H838 — Homo sapiens (Human), Lung adenocarcinoma, Cancer cell line (CVCL_1594), CCLE — Homo sapiens (Human), Induced pluripotent stem cell (CVCL_E025), HCC827 — Homo sapiens (Human), Lung adenocarcinoma, Cancer cell line (CVCL_2063), ACC 307 — Homo sapiens (Human), Induced pluripotent stem cell (CVCL_JR24), DV90 — Homo sapiens (Human), Lung adenocarcinoma, Cancer cell line (CVCL_1184), CSC — Gallus gallus (Chicken), Spontaneously immortalized cell line (CVCL_C3NY), HTB — Mus musculus (Mouse), Hybridoma (CVCL_A8FQ), ATCC-CCL-185 — Mus musculus (Mouse), Undefined cell line type (CVCL_M023), H1975 — Homo sapiens (Human), Lung adenocarcinoma, Cancer cell line (CVCL_1511), HTTB178 — Homo sapiens (Human), Induced pluripotent stem cell (CVCL_JU54), A549 — Homo sapiens (Human), Lung adenocarcinoma, Cancer cell line (CVCL_0023), NCI-H1395 — Homo sapiens (Human), Lung adenocarcinoma, Cancer cell line (CVCL_1467), ACC 563 — Homo sapiens (Human), Finite cell line (CVCL_X234), HCC78 — Homo sapiens (Human), Lung adenocarcinoma, Cancer cell line (CVCL_2061)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC10837414/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10837414/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC10837414/full.md

---
Source: https://tomesphere.com/paper/PMC10837414