# VirJenDB: a FAIR (meta)data and bioinformatics platform for all viruses

**Authors:** Shahram Saghaei, Malte Siemers, Kilian L Ossetek, Stephan Richter, Robert A Edwards, Simon Roux, Andrzej Zielezinski, Bas E Dutilh, Manja Marz, Noriko A Cassman

PMC · DOI: 10.1093/nar/gkaf1224 · Nucleic Acids Research · 2025-12-17

## TL;DR

VirJenDB is a platform that curates and provides access to virus data and metadata from multiple sources, supporting research on both eukaryotic and prokaryotic viruses.

## Contribution

VirJenDB introduces a community-driven, FAIR-compliant platform for virus data curation and analysis, integrating metadata from 16 sources and offering tools for researchers.

## Key findings

- VirJenDB links 85 curated metadata fields to 15.4 million virus sequences, with 88% from eukaryotes.
- A novel collection of 0.91 million vOTU sequences was created for downstream analyses.
- The platform provides API and web-based access for search, filtering, and visualization of data.

## Abstract

High-throughput sequencing has generated an unprecedented volume of data. However, researcher-submitted data in repositories requires extensive curation and quality control for reuse. These tasks are hindered by the multiplicity of repositories, the sheer volume of the data, and the complexity of virus (meta)data curation. To address these challenges, VirJenDB offers a user-friendly platform to facilitate versioned, community-driven curation, and ontology development. Virus sequences were ingested from 16 sources, including ~200 fields of metadata or standards, covering taxonomy, sample, and host information. Up to 85 metadata fields have undergone at least one round of curation, and are linked to 15.4 million virus sequences, with 88 % from those infecting eukaryotes and the remaining infecting prokaryotes. Subsets were created, including a novel collection of 0.91 million viral operational taxonomic unit (vOTU) sequences across all viruses, while keeping the original sequences from each vOTU to facilitate downstream analyses, e.g. sequence variation. The VirJenDB web portal (https://www.virjendb.org) provides HTTPS and Application Programming Interface (API) access to the sequence datasets and metadata, offering a search engine, filtering, download, visualizations, and documentation. VirJenDB aims to connect the phage and eukaryotic virus research communities by supporting webtool integration, meta-analyses, and metadata schema extensions.

Graphical Abstract

## Full-text entities

- **Species:** Staphylococcus aureus (species) [taxon 1280], Homo sapiens (human, species) [taxon 9606], Listeria monocytogenes (species) [taxon 1639], Human immunodeficiency virus (species) [taxon 12721], Viruses (acellular root) [taxon 10239], Gallus gallus (bantam, species) [taxon 9031], Influenza A virus (no rank) [taxon 11320], Coronaviridae (family) [taxon 11118], Bos taurus (bovine, species) [taxon 9913], Rotavirus A (no rank) [taxon 28875], Klebsiella pneumoniae (species) [taxon 573], hepatitis C virus [taxon 11103], Escherichia coli (E. coli, species) [taxon 562], Hepatitis B virus (no rank) [taxon 10407], Human immunodeficiency virus 1 (no rank) [taxon 11676], Sus scrofa (pig, species) [taxon 9823], Orthohepacivirus (genus) [taxon 11102], Salmonella enterica (species) [taxon 28901], Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12807664/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12807664/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC12807664/full.md

---
Source: https://tomesphere.com/paper/PMC12807664