# BASiCS workflow: a step-by-step analysis of expression variability using single cell RNA sequencing data

**Authors:** Alan O'Callaghan, Nils Eling, John C. Marioni, Catalina A. Vallejos, Michel S Naslavsky, Alan O'Callaghan, Andrew McDavid, Alan O'Callaghan, Oliver M. Crook, Alan O'Callaghan

PMC · DOI: 10.12688/f1000research.74416.1 · 2022-01-18

## TL;DR

This paper introduces a computational workflow using the BASiCS package to analyze gene expression variability in single-cell RNA sequencing data.

## Contribution

The novel contribution is a step-by-step workflow integrating BASiCS for robust quantification of expression variability while accounting for technical noise.

## Key findings

- BASiCS identifies highly variable and lowly variable genes within a homogeneous cell population.
- The workflow includes quality control and data exploration using scater and scran packages.
- A Docker image ensures reproducibility of the results.

## Abstract

Cell-to-cell gene expression variability is an inherent feature of complex biological systems, such as immunity and development. Single-cell RNA sequencing is a powerful tool to quantify this heterogeneity, but it is prone to strong technical noise. In this article, we describe a step-by-step computational workflow that uses the BASiCS Bioconductor package to robustly quantify expression variability within and between known groups of cells (such as experimental conditions or cell types). BASiCS uses an integrated framework for data normalisation, technical noise quantification and downstream analyses, propagating statistical uncertainty across these steps. Within a single seemingly homogeneous cell population, BASiCS can identify highly variable genes that exhibit strong heterogeneity as well as lowly variable genes with stable expression. BASiCS also uses a probabilistic decision rule to identify changes in expression variability between cell populations, whilst avoiding confounding effects related to differences in technical noise or in overall abundance. Using a publicly available dataset, we guide users through a complete pipeline that includes preliminary steps for quality control, as well as data exploration using the scater and scran Bioconductor packages. The workflow is accompanied by a Docker image that ensures the reproducibility of our results.

## Full-text entities

- **Genes:** Cd3e (CD3 antigen, epsilon polypeptide) [NCBI Gene 12501] {aka CD3, CD3epsilon, T3e}, Rps14 (ribosomal protein S14) [NCBI Gene 20044] {aka 2600014J02Rik}, Tbx21 (T-box 21) [NCBI Gene 57765] {aka TBT1, Tbet, Tblym}, Gata1 (GATA binding protein 1) [NCBI Gene 14460] {aka Gata-1, Gf-1, eryf1}, Cd4 (CD4 antigen) [NCBI Gene 12504] {aka L3T4, Ly-4}, Cd28 (CD28 antigen) [NCBI Gene 12487]
- **Diseases:** DM (MESH:D020423), LVG (OMIM:242860), Crook (MESH:C536852), MCMC (MESH:D007161), Burn (MESH:D002056), confusion (MESH:D003221)
- **Chemicals:** E-MTAB-4888 (-)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Mus musculus domesticus (western European house mouse, subspecies) [taxon 10092], Homo sapiens (human, species) [taxon 9606], Mus musculus castaneus (southeastern Asian house mouse, subspecies) [taxon 10091]

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11109695/full.md

---
Source: https://tomesphere.com/paper/PMC11109695