# A mathematical framework to correct for compositionality in microbiome data sets

**Authors:** Samuel P. Forry, Stephanie L. Servetas, Jason G. Kralj, Monique E. Hunter, Jennifer N. Dootz, Scott A. Jackson

PMC · DOI: 10.1128/aem.01126-25 · Applied and Environmental Microbiology · 2026-01-06

## TL;DR

This paper introduces a new method to improve the accuracy of microbiome data by correcting for compositional biases using internal standards.

## Contribution

A novel mathematical framework using internal standards to calculate taxon abundances independent of sample composition.

## Key findings

- Scaled Abundances outperformed traditional relative abundance measurements in precision and accuracy.
- The method enables reliable, quantitative comparisons of microbiome taxa across different sample compositions.
- The approach is applicable to both amplicon and shotgun metagenomic sequencing analyses.

## Abstract

The increasing use of metagenomic sequencing (MGS) for microbiome analysis
has significantly advanced our understanding of microbial communities and
their roles in various biological processes, including human health,
environmental cycling, and disease. However, the inherent compositionality
of MGS data, where the relative abundance of each taxon depends on the
abundance of all other taxa, complicates the measurement of individual taxa
and the interpretation of microbiome data. Here, we describe an experimental
design that incorporates exogenous internal standards in routine MGS
analyses to correct for compositional distortions. A mathematical framework
was developed for using the observed internal standard relative abundance to
calculate “Scaled Abundances” for native taxa that were (i)
independent of sample composition and (ii) directly proportional to actual
biological abundances. Through analysis of mock community and human gut
microbiome samples, we demonstrate that Scaled Abundances outperformed
traditional relative abundance measurements in both precision and accuracy
and enabled reliable, quantitative comparisons of individual microbiome taxa
across varied sample compositions and across a wide range of taxon
abundances. By providing a pathway to accurate taxon quantification, this
approach holds significant potential for advancing microbiome research,
particularly in clinical and environmental health applications where precise
microbial profiling is critical.

Metagenomic sequencing (MGS) analysis has become central to modern
characterizations of microbiome samples. However, the inherent
compositionality of these analyses, where the relative abundance of each
taxon depends on the abundance of all other taxa, often complicates
interpretations of results. We present here an experimental design and
corresponding mathematical framework that uses internal standards with
routine MGS methods to correct for compositional distortions. We
validate this approach for both amplicon and shotgun MGS analysis of
mock communities and human gut microbiome (fecal) samples. By using
internal standards to remove compositionality, we demonstrate
significantly improved measurement accuracy and precision for
quantification of taxon abundances. This approach is broadly applicable
across a wide range of microbiome research applications.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606], gut metagenome (species) [taxon 749906]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12889883/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12889883/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/PMC12889883/full.md

---
Source: https://tomesphere.com/paper/PMC12889883