# NEMESISdb: A full length 16S rRNA gene dataset for the detection of human, fish, and crustacean potentially pathogenic bacteria

**Authors:** Son-Hoang Tran, Claudia Ximena Restrepo-Ortiz, Dinh Quang Vu, Marc Troussellier, Yvan Bettarel, Thierry Bouvier, Van Ngoc Bui, Nguyen Hieu Minh, Trung Du Hoang, Quang Huy Nguyen, Jean-Christophe Auguet

PMC · DOI: 10.1016/j.dib.2025.112135 · 2025-10-06

## TL;DR

NEMESISdb is a curated dataset of full-length 16S rRNA gene sequences for identifying potentially pathogenic bacteria in humans, fish, and crustaceans.

## Contribution

NEMESISdb introduces a comprehensive, curated 16S rRNA dataset focused on marine and coastal pathogenic bacteria for human, fish, and crustacean hosts.

## Key findings

- NEMESISdb includes over 150,000 curated 16S rRNA sequences for 1703 human, 222 fish, and 64 crustacean pathogenic bacteria species.
- The dataset is optimized for use with BLAST and classifier tools for accurate detection in metagenomic and metabarcoding studies.
- NEMESISdb supports One Health research by linking pathogen circulation across environmental, animal, and human systems.

## Abstract

NEMESISdb is a 16S rRNA full length sequence curated dataset designed to enable the identification and tracking of potentially pathogenic bacteria (PPB) for human, fish, and crustacean hosts. It addresses the limited focus on marine and coastal environments as key reservoirs for PPB, where bacteria from diverse sources—terrestrial, marine, and animal—can coexist. Leveraging recent advances in high-throughput sequencing, NEMESISdb provides a robust resource for the detection of PPB in 16S rRNA gene metabarcoding or metagenomic data. The database comprises three datasets corresponding to human, fish, and crustacean hosts, containing 1703, 222, and 64 PPB species, respectively, with a total of over 150,000 16S rRNA full length sequences curated for accuracy. This resource was constructed by extracting sequences from the SILVA 138.2 SSU Ref NR99 database, refining them through a rigorous curation pipeline to ensure taxonomic consistency and eliminate misclassifications. The resulting datasets are optimized for use with popular tools such as BLAST and classifier software, enabling rapid and accurate detection of PPB in metabarcoding and metagenomic data. NEMESISdb supports diverse applications, including pathogen surveillance in aquatic ecosystems, studies on environmental factors influencing PPB dynamics, and the development of targeted strategies for mitigating pathogen impacts in aquaculture. Additionally, it facilitates research within the One Health framework by linking the circulation of PPB across environmental, animal, and human compartments.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12553003/full.md

---
Source: https://tomesphere.com/paper/PMC12553003