# Ensemblex: an accuracy-weighted ensemble genetic demultiplexing framework for population-scale scRNAseq sample pooling

**Authors:** Michael R. Fiorini, Saeid Amiri, Allison A. Dilliott, Cristine M. Yde Ohki, Lukasz Smigielski, Susanne Walitza, Edward A. Fon, Edna Grünblatt, Rhalena A. Thomas, Sali M. K. Farhan

PMC · DOI: 10.1186/s13059-025-03643-1 · Genome Biology · 2025-07-03

## TL;DR

Ensemblex improves accuracy in identifying cell origins in pooled single-cell RNA sequencing data, enabling cost-effective population-scale studies.

## Contribution

Ensemblex introduces a novel accuracy-weighted ensemble framework for genetic demultiplexing in scRNAseq.

## Key findings

- Ensemblex integrates four algorithms to enhance subject label identification accuracy.
- The framework demonstrates superior performance on computationally and experimentally pooled samples.

## Abstract

Multiplexing samples from distinct individuals prior to sequencing is a promising step towards achieving population-scale single-cell RNA sequencing by reducing the restrictive costs of the technology. Individual genetic demultiplexing tools resolve the donor-of-origin identity of pooled cells using natural genetic variation but present diminished accuracy on highly multiplexed experiments, impeding the analytic potential of the dataset. In response, we introduce Ensemblex: an accuracy-weighted, ensemble genetic demultiplexing framework that integrates four distinct algorithms to identify the most probable subject labels. Using computationally and experimentally pooled samples, we demonstrate Ensemblex’s superior accuracy and illustrate the implications of robust demultiplexing on biological analyses.

The online version contains supplementary material available at 10.1186/s13059-025-03643-1.

## Full-text entities

- **Genes:** S100A1 (S100 calcium binding protein A1) [NCBI Gene 6271] {aka S100, S100-alpha, S100A}
- **Diseases:** Neurodevelopmental Disorder (MESH:D002658), NSCLC (MESH:D002289), NSC (MESH:D000092423), DaN (MESH:D009410), neurological diseases (MESH:D020271), ARI (MESH:D000275), cancer (MESH:D009369), ADHD (MESH:D001289), neurodegenerative disease (MESH:D019636), pT (OMIM:617450), PD (MESH:D010300)
- **Chemicals:** PBS (MESH:D007854), CMO (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** FOUNDIN-PD — Mus musculus (Mouse), Hybridoma (CVCL_U609)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12224856/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12224856/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/PMC12224856/full.md

---
Source: https://tomesphere.com/paper/PMC12224856