# ParaRef: a decontaminated reference database for parasite detection in ancient and modern metagenomic datasets

**Authors:** Jonas Niemann, Yuejiao Huang, Liam T. Lanigan, Arve L. Willingham Grijalba, Robert R. Dunn, Martin Sikora, Hannes Schroeder

PMC · DOI: 10.1186/s13059-025-03818-w · Genome Biology · 2025-10-23

## TL;DR

This paper introduces ParaRef, a cleaned reference database that improves the accuracy of detecting parasites in metagenomic data by reducing contamination.

## Contribution

The novel contribution is the creation of a decontaminated reference database for parasite detection in metagenomic datasets.

## Key findings

- Decontamination of parasite genomes significantly reduces false detection rates.
- ParaRef improves overall detection accuracy for parasite species in metagenomic data.

## Abstract

Shotgun metagenomics holds great potential for identifying parasite DNA in biological samples, but its effectiveness is limited by widespread contamination in publicly available reference genomes, which hinders accurate detection. In this study, we systematically quantify and remove contamination from 831 published endoparasite genomes to create ParaRef, a curated reference database for species-level parasite detection. We show that decontamination significantly reduces false detection rates and improves overall detection accuracy. Our study highlights the pervasive issue of contamination in public databases and offers a resource that will enhance the reliability of parasite detection using metagenomics.

The online version contains supplementary material available at 10.1186/s13059-025-03818-w.

## Full-text entities

- **Genes:** cytochrome c oxidase subunit I [NCBI Gene 12354277]
- **Diseases:** schistosomiasis (MESH:D012552), Enterobius vermicularis (MESH:D017229), Ancylostoma ceylanicum (MESH:C538433), Schistosoma mansoni infection (MESH:D012555), gastrointestinal parasites (MESH:D005767), infection (MESH:D007239), hookworm (MESH:D006725), death (MESH:D003643), malaria (MESH:D008288), parasite infections (MESH:D010272)
- **Chemicals:** water (MESH:D014867), cytosine (MESH:D003596), GX (MESH:C001306), FCS (-)
- **Species:** Caenorhabditis elegans (species) [taxon 6239], Mus musculus (house mouse, species) [taxon 10090], Platyhelminthes (flatworm, phylum) [taxon 6157], Trichuris trichiura (human whipworm, species) [taxon 36087], Baylisascaris schroederi (giant panda roundworm, species) [taxon 522413], Dasyprocta punctata (Central American agouti, species) [taxon 34846], Onchocerca flexuosa (species) [taxon 387005], Trichomonas vaginalis (species) [taxon 5722], Canis lupus familiaris (dog, subspecies) [taxon 9615], Lepeophtheirus salmonis (salmon louse, species) [taxon 72036], Ancylostoma ceylanicum (species) [taxon 53326], Toxocara canis (dog roundworm, species) [taxon 6265], Schistosoma japonicum (species) [taxon 6182], Anisakis simplex (herring worm, species) [taxon 6269], Stenotrophomonas indicatrix (species) [taxon 2045451], Escherichia coli (E. coli, species) [taxon 562], Trypanosoma cruzi (species) [taxon 5693], Sus scrofa (pig, species) [taxon 9823], Dasyprocta azarae (Azara's agouti, species) [taxon 1541202], C. elegans [taxon 328850], Chlamydia suis (species) [taxon 83559], Echinococcus oligarthrus (species) [taxon 6212], Elaeophora elaphi (species) [taxon 1147741], Trichuris suis (pig whipworm, species) [taxon 68888], Schistosoma mansoni (species) [taxon 6183], Ovis aries (domestic sheep, species) [taxon 9940], Sphingomonas (genus) [taxon 13687], Parelaphostrongylus tenuis (species) [taxon 148309], Ascaris suum (pig roundworm, species) [taxon 6253], Enterobius vermicularis (human pinworm, species) [taxon 51028], Mansonella sp. (species) [taxon 2756192], Onchocerca ochengi (species) [taxon 42157], Oesophagostomum dentatum (nodular worm, species) [taxon 61180], Homo sapiens (human, species) [taxon 9606], Felis catus (cat, species) [taxon 9685], Plasmodium falciparum (malaria parasite P. falciparum, species) [taxon 5833], Taenia saginata (beef tapeworm, species) [taxon 6206], Nematoda (nematode, phylum) [taxon 6231], Sarcocystis (genus) [taxon 5812], Plasmodium malariae (species) [taxon 5858], Ascaris lumbricoides (common roundworm, species) [taxon 6252], Ancylostoma caninum (dog hookworm, species) [taxon 29170], Physeter macrocephalus (sperm whale, species) [taxon 9755], Taenia solium (pig tapeworm, species) [taxon 6204], Dicrocoelium dendriticum (species) [taxon 57078], Lichtheimia ramosa (species) [taxon 688394], Fukomys damarensis (Damara mole rat, species) [taxon 885580], gut metagenome (species) [taxon 749906], Plasmodium vivax (malaria parasite P. vivax, species) [taxon 5855], Bradyrhizobium (genus) [taxon 374], Blastocystis sp. subtype 6 (species) [taxon 944208], Shigella (genus) [taxon 620], Toxascaris leonina (species) [taxon 59264], Oryctolagus cuniculus (domestic rabbit, species) [taxon 9986], Morganella morganii (species) [taxon 582], Cervus elaphus (red deer, species) [taxon 9860], Cuniculus (genus) [taxon 723807], Bos taurus (bovine, species) [taxon 9913], Odocoileus virginianus (white-tailed deer, species) [taxon 9874], Baylisascaris ailuri (red panda roundworm, species) [taxon 941948]
- **Mutations:** T2T

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12548146/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12548146/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC12548146/full.md

---
Source: https://tomesphere.com/paper/PMC12548146