# Probability of stealth multiplets in sample-multiplexing for droplet-based single-cell analysis

**Authors:** Fumio Nakaki, James Sharpe

PMC · DOI: 10.1186/s12864-025-11835-z · 2025-07-23

## TL;DR

This paper introduces a model to predict hidden multiplets in single-cell RNA sequencing, which can affect data accuracy if not properly addressed.

## Contribution

The novel contribution is the theoretical model predicting stealth multiplet probabilities in mx-scRNA-seq.

## Key findings

- Partial stealth multiplets can significantly impact dataset results when demultiplexing is suboptimal.
- Stealth multiplets were confirmed to exist in real mx-scRNA-seq datasets using two demultiplexing methods.
- Optimizing labelling and demultiplexing is crucial for ensuring data integrity in mx-scRNA-seq.

## Abstract

One of the technical limits of droplet-based single-cell RNA sequencing (scRNA-seq) is the presence of multiplets, i.e. droplets that capture multiple cells. Sample-multiplexing scRNA-seq (mx-scRNA-seq) enables us to evaluate large numbers of different samples or experiments simultaneously by reducing the occurrence of undetectable multiplets. However, there is still a possibility of hidden multiplets among what appear to be singlets, for which we introduce the term stealth multiplets, and their probability is yet to be quantitatively examined.

We developed a simple theoretical model to predict four classes of possible multiplets in mx-scRNA-seq: Homogeneous stealth, partial stealth, multilabelled, and unlabelled. We estimated the probability of each class and have found that the partial stealth multiplet, which has been previously overlooked, may impact the results of the whole dataset, particularly when the labelling process or demultiplexing is suboptimal. Also, we demonstrated their presence in real mx-scRNA-seq datasets both in oligonucleotide-barcode demultiplexing and genotype-based demultiplexing.

Our results show the importance of optimising the labelling procedure and choosing the most suitable demultiplexing algorithm. We thus offer a theoretical basis to estimate the probability of each type of multiplet to ensure the integrity of mx-scRNA-seq.

The online version contains supplementary material available at 10.1186/s12864-025-11835-z.

## Full-text entities

- **Genes:** Gem (GTP binding protein overexpressed in skeletal muscle) [NCBI Gene 14579], Sox9 (SRY (sex determining region Y)-box 9) [NCBI Gene 20682] {aka 2010306G03Rik, mKIAA4243, mSox9}, Lif (leukemia inhibitory factor) [NCBI Gene 16878]
- **Diseases:** NSCLC (MESH:D002289), UMAP (MESH:C567162)
- **Chemicals:** oligonucleotide (MESH:D009841), Cholesterol (MESH:D002784), EDTA (MESH:D004492), Streptomycin (MESH:D013307), L-Glutamine (MESH:D005973), lipid (MESH:D008055), CMO (-), Penicillin (MESH:D010406), PVA (MESH:D011142)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** HEK293T — Homo sapiens (Human), Transformed cell line (CVCL_0063), GSE108313 — Konosirus punctatus (Dotted gizzard shad), Spontaneously immortalized cell line (CVCL_6F81), 3T3 — Mus musculus (Mouse), Spontaneously immortalized cell line (CVCL_0594), C57BL/6J — Mus musculus (Mouse), Transformed cell line (CVCL_C0MW), ES — Gallus gallus (Chicken), Somatic stem cell (CVCL_JE75), HL — Notophthalmus viridescens (Eastern newt), Spontaneously immortalized cell line (CVCL_B7RB)

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12285059/full.md

---
Source: https://tomesphere.com/paper/PMC12285059