# A comprehensive database for biological data derived from sewage in five European cities

**Authors:** Ágnes Becsei, Patrick Munk, Alessandro Fuschi, Saria Otani, József Stéger, Dávid Visontai, Krisztián Papp, Christian Brinch, Ravi Kant, Ilya Weinstein, Olli Vapalahti, Miranda de Graaf, Claudia M E Schapendonk, Jeroen Roelfsema, Maaike van den Beld, Roan Pijnacker, Eelco Franz, Patricia Alba, Antonio Battisti, Alessandra De Cesare, Valentina Indio, Fulvia Troja, Tarja Sironen, Chiara Oliveri, Frédérique Pasquali, Ivan Liachko, Benjamin Auch, Colman O’Cathail, Krisztián Bányai, Magdolna Makó, Péter Pollner, Marion Koopmans, Istvan Csabai, Daniel Remondini, Frank M Aarestrup

PMC · DOI: 10.1093/database/baaf089 · Database: The Journal of Biological Databases and Curation · 2026-01-20

## TL;DR

This paper introduces a detailed and accessible database of sewage metagenomic data from five European cities to support pathogen surveillance and microbial research.

## Contribution

The paper presents a highly curated, longitudinal sewage metagenomic dataset with extensive analytical outputs and a public PostgreSQL database for efficient data reuse.

## Key findings

- The dataset includes 239 sewage samples with taxonomic profiles, antimicrobial resistance genes, and metagenome-assembled genomes.
- Hi-C sequencing was used on a subset of samples to improve genomic linkage analysis.
- A PostgreSQL database was created to enable efficient querying and subsetting of the data.

## Abstract

Sewage metagenomics is a powerful tool for proactive pathogen surveillance and understanding microbial community dynamics. To support such efforts, we present a highly curated and accessible longitudinal dataset of 239 sewage samples collected from five European cities. The dataset, processed through metagenomic sequencing, includes rich analytical outputs such as taxonomic profiles, identified antimicrobial resistance genes, assembled contigs with annotated origins, metagenome-assembled genomes with functional gene annotations, and metadata. Given the computational intensity and time required to reproduce such analyses, we share this dataset to promote reuse and advance research. In addition to the metagenomic data, qPCR was used to identify specific pathogens, and Hi-C sequencing was performed on a subset of the samples to strengthen genomic linkage analysis. Central to this resource is a publicly available PostgreSQL database, designed to facilitate efficient exploration and reuse of the data. This comprehensive database allows users to perform targeted queries, subset data, and streamline access to this extensive resource.

## Full-text entities

- **Genes:** ABL2 (ABL proto-oncogene 2, non-receptor tyrosine kinase) [NCBI Gene 27] {aka ABLL, ARG}, MAG (myelin associated glycoprotein) [NCBI Gene 4099] {aka GMA, S-MAG, SIGLEC-4A, SIGLEC4, SIGLEC4A, SPG75}
- **Diseases:** COVID-19 (MESH:D000086382), infectious disease (MESH:D003141), polio (MESH:D011051), AMR (MESH:D060467)
- **Chemicals:** tetracycline (MESH:D013752), beta-lactam (MESH:D047090), formaldehyde (MESH:D005557), aminoglycoside (MESH:D000617), biotin (MESH:D001710), glycine (MESH:D005998)
- **Species:** Homo sapiens (human, species) [taxon 9606], Giardia duodenalis (species) [taxon 5741], Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12817144/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12817144/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/PMC12817144/full.md

---
Source: https://tomesphere.com/paper/PMC12817144