# Strain-level metagenomic profiling using pangenome graphs with PanTax

**Authors:** Wenhai Zhang, Yuansheng Liu, Guangyi Li, Jialu Xu, Enlian Chen, Alexander Schönhuth, Xiao Luo

PMC · DOI: 10.1101/gr.280858.125 · Genome Research · 2026-02-01

## TL;DR

PanTax is a new method for analyzing microbial communities that improves strain-level classification by using pangenome graphs, leading to more accurate results.

## Contribution

PanTax introduces a pangenome graph-based approach for strain-level metagenomic profiling that surpasses existing methods in accuracy and versatility.

## Key findings

- PanTax outperforms existing methods with significantly higher F1 scores at the strain level.
- The method is compatible with both short and long reads and supports single or multiple species.
- Pangenome graphs capture genetic variability better than linear reference genomes, reducing ambiguity.

## Abstract

Microbes are omnipresent, thriving in a range of habitats, from oceans to soils, and even within our gastrointestinal tracts. They play a vital role in maintaining ecological equilibrium and promoting the health of their hosts. Consequently, understanding the diversity in terms of strains in microbial communities is crucial, as variations between strains can lead to different phenotypic expressions or diverse biological functions. However, current methods for taxonomic classification from metagenomic sequencing data have several limitations, including their reliance solely on species resolution, support for either short or long reads, or their confinement to a given single species. Most notably, most existing strain-level taxonomic classifiers rely on the sequence representation of multiple linear reference genomes, which fails to capture the sequence correlations among these genomes, potentially introducing ambiguity and biases in metagenomic profiling. Here, we present PanTax, a pangenome graph-based taxonomic profiler that overcomes the shortcomings of sequence-based approaches, because pangenome graphs possess the capability to depict the full range of genetic variability present across multiple evolutionarily or environmentally related genomes. PanTax provides a comprehensive solution to taxonomic classification for strain resolution, compatibility with both short and long reads, and compatibility with single or multiple species. Extensive benchmarking results demonstrate that PanTax drastically outperforms state-of-the-art approaches, primarily evidenced by its significantly higher F1 score at the strain level, while maintaining comparable or better performance in other aspects across various data sets.

## Full-text entities

- **Genes:** SIM2 (SIM bHLH transcription factor 2) [NCBI Gene 6493] {aka HMC13F06, HMC29C01, SIM, bHLHe15}
- **Diseases:** infection (MESH:D007239), inflammatory bowel disease (MESH:D015212), PanTax (MESH:D019292), intestinal diseases (MESH:D007410), PD (MESH:D010300)
- **Chemicals:** PanTax (-), erythromycin (MESH:D004917)
- **Species:** Staphylococcus epidermidis (species) [taxon 1282], Bacteria Latreille et al. 1825 (Bacteria stick insect, genus) [taxon 629395], Vibrio cholerae (species) [taxon 666], Viruses (acellular root) [taxon 10239], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Streptococcus (genus) [taxon 1301], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** NIHLM001 — Homo sapiens (Human), Melanoma, Cancer cell line (CVCL_B4K8), NIHLM023 — Homo sapiens (Human), Induced pluripotent stem cell (CVCL_JQ65)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12863173/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12863173/full.md

## References

67 references — full list in the complete paper: https://tomesphere.com/paper/PMC12863173/full.md

---
Source: https://tomesphere.com/paper/PMC12863173