# A curated bacterial and archaeal 16S rRNA Gene Oral Sequences dataset

**Authors:** Lara Vázquez-González, Alba Regueira-Iglesias, Carlos Balsa-Castro, Inmaculada Tomás, María J. Carreira

PMC · DOI: 10.1038/s41597-025-05050-4 · Scientific Data · 2025-05-02

## TL;DR

This paper introduces a dataset of 16S rRNA gene sequences specific to the human oral cavity, which helps improve accuracy in microbial abundance estimation.

## Contribution

The novel contribution is a curated dataset of bacterial and archaeal 16S rRNA gene sequences and copy numbers specific to the human oral microbiome.

## Key findings

- The dataset includes 3,192 bacterial and 191 archaeal complete genomes from the human oral cavity.
- Sequence variants of 16S rRNA genes were identified and cataloged for accurate microbial abundance estimation.
- The dataset and its construction pipeline can be used in future microbiome research.

## Abstract

In a given species, genomes and 16S rRNA gene sequences, along with their intragenomic copy numbers, can vary greatly across environments. The gene copy numbers are crucial for technologies which estimate microbial abundances based on gene counts, such as polymerase chain reaction and high-throughput sequencing. In these, taxa with fewer genes may be underestimated, while those with more genes might be overestimated. Therefore, it is essential to have accurate gene copy number databases specific to the niche under study. The 16S rRNA Gene Oral Sequences dataset (16SGOSeq) contains the number of 16S rRNA genes and their variants in the complete genomes of the bacterial and archaeal species present in the human oral cavity. It includes 3,192 complete genomes of oral bacteria and 191 complete genomes of oral archaea, from which the 16S rRNA gene sequences were extracted, and the sequence variants were identified. This oral-specific dataset of prokaryotic organisms and the pipeline followed for its construction can be applied by clinical microbiologists, bioinformaticians, or microbial ecologists in future microbiome research.

## Linked entities

- **Genes:** 16S rRNA (16S ribosomal RNA) [NCBI Gene 2597965]

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12048654/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12048654/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/PMC12048654/full.md

---
Source: https://tomesphere.com/paper/PMC12048654