# Incorporating Scale Uncertainty into Differential Expression Analyses Using ALDEx2

**Authors:** Scott J. Dos Santos, Gregory B. Gloor

PMC · DOI: 10.1002/cpz1.70307 · 2026-02-04

## TL;DR

This paper introduces a method to improve differential expression analysis by accounting for scale uncertainty in RNA-seq and metatranscriptomic data using the ALDEx2 package.

## Contribution

The paper introduces scale models in ALDEx2 to correct normalization biases and reduce false discoveries in differential expression analyses.

## Key findings

- Using scale models in ALDEx2 reduces false-discovery rates in differential expression analyses.
- Failure to account for scale uncertainty leads to high false-discovery rates due to incorrect normalization assumptions.
- ALDEx2 outputs can be used for high-level data visualization through principal component analysis.

## Abstract

Differential abundance or expression analyses are routinely performed on metagenomic, metatranscriptomic, and amplicon sequencing data. In such datasets, analysts usually have no information regarding the true scale (i.e., size) of the microbial community or sample under study, with inter‐sample differences in sequencing depth instead being driven by technical variation rather than biological factors. Recent work has demonstrated that normalizations used in all analysis tools make incorrect assumptions about the biological scale of the system in question, leading to unacceptably high false‐discovery rates in the output. To mitigate this, analysts can acknowledge and account for the uncertainty of the overall system scale during normalization by building scale models of the data—a feature that has been integrated into the ALDEx2 R package. Here, we provide reproducible examples that demonstrate how to incorporate scale models into differential expression analyses of RNA‐seq data using bulk transcriptome and metatranscriptomic datasets, as well as the consequences of not doing so. We also show how to use the output of ALDEx2 to create high‐level exploratory visualizations of their data through principal component analysis. © 2026 The Author(s). Current Protocols published by Wiley Periodicals LLC.

Basic Protocol 1: Using a simple scale model for differential expression analysis to avoid dual‐cutoff P value/significance thresholds

Basic Protocol 2: Implementing a full informed scale model to correct scale‐related data asymmetry in differential expression analyses

Basic Protocol 3: Visualizing ALDEx2 outputs using a compositional approach: Principal component analysis

## Full-text entities

- **Genes:** SNF2 (SWI/SNF catalytic subunit SNF2) [NCBI Gene 854465] {aka GAM1, HAF1, SWI2, TYE3}
- **Diseases:** CORRECT SCALE-RELATED (MESH:C538175), BV (MESH:D016585), cancer (MESH:D009369)
- **Chemicals:** nivolumab (MESH:D000077594), metronidazole (MESH:D008795), KEGG (-)
- **Species:** Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12871571/full.md

---
Source: https://tomesphere.com/paper/PMC12871571