# Creating unity: linking 16S rRNA gene sequence information to the core taxonomy of genomes

**Authors:** Hilde Vinje, Knut Rudi, Lars Snipen

PMC · DOI: 10.1186/s40793-025-00789-0 · Environmental Microbiome · 2025-10-28

## TL;DR

This paper improves the link between genome-based taxonomy and traditional 16S rRNA gene classification, helping better classify microbial species.

## Contribution

The study identifies optimal clustering thresholds for 16S rRNA sequences under the GTDB taxonomy to improve taxonomic resolution.

## Key findings

- Species-level resolution requires a 99% identity threshold for 16S sequences.
- Genus-level resolution needs thresholds of 92–96% identity, but these vary across different branches.
- Fixed divergence thresholds are insufficient; adaptive methods are needed for accurate classification.

## Abstract

The Genome Taxonomy Database (GTDB) initiative aims to modernize prokaryotic taxonomy by aligning it with the great amounts of full-length genomes available today. Unfortunately, there is a poor link between the GTDB and the historically widely used 16S rRNA gene-based taxonomy. The current study explores the within and between divergence of the 16S rRNA gene sequences under GTDB taxonomy, refining our understanding of the 16S gene’s resolution under this new taxonomic system. The analysis focuses on the divergence of 16S sequences collected from the GTDB genomes to identify optimal clustering thresholds for taxonomic resolution. Generalized linear mixed models (GLMMs) were fitted to estimate divergences within taxonomic ranks, correcting for the variable quality of the GTDB genomes.

To achieve GTDB species-level resolution, 16S sequences need clustering at a divergence threshold of around 0.01 (99% identity), while genus-level resolution requires thresholds of 0.04–0.08 (92–96% identity), optimal thresholds vary significantly across branches, highlighting the limitations of using a fixed divergence threshold.

The results suggest a more adaptive approach to taxonomic assignment from 16S data is needed, tailored to the diversity and complexity of the samples. These findings are fundamental for an improved taxonomic classification of environmental 16S data.

The online version contains supplementary material available at 10.1186/s40793-025-00789-0.

## Linked entities

- **Genes:** 16S rRNA (16S ribosomal RNA) [NCBI Gene 2597965]

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12570451/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12570451/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/PMC12570451/full.md

---
Source: https://tomesphere.com/paper/PMC12570451