# Contamination of fungal genomes of Onygenaceae (Phylum Ascomycota) in public databases: incidence, detection, and impact

**Authors:** Alan Omar Granados-Casas, Ana Fernández-Bravo, Alberto Miguel Stchigel, José Francisco Cano-Lira

PMC · DOI: 10.1186/s12864-025-12223-3 · 2025-11-19

## TL;DR

This study finds contamination in public fungal genomes from the Onygenaceae family and shows how removing it improves genome quality and reliability.

## Contribution

The study introduces a contamination screening and removal workflow for fungal genomes, specifically in the Onygenaceae family.

## Key findings

- Four Onygenaceae genomes had contamination levels between 5 and 12%, mostly bacterial.
- Contamination removal reduced contamination levels to below 3% in all cases.
- Filtered assemblies showed improved genome quality and fewer bacterial protein families.

## Abstract

Genomic datasets often contain unwanted, foreign, or erroneous nucleotide sequences that do not belong to the organism under study. Such contamination can significantly compromise genome analyses, reducing the accuracy and reliability of the results. Despite its potential impact, few studies have addressed the contamination of fungal genomes by exogenous sequences. Here, we analyzed eleven publicly available genomes of fungi from the family Onygenaceae, retrieved from the National Center for Biotechnology Information (NCBI) database. A comprehensive quality assessment was performed, evaluating genome completeness, contiguity, and contamination levels. Genomes with lower statistical quality and putatively contaminated were selected for further improvement. To enhance assembly quality, we built a custom Kraken 2 database including four high-quality genomes of closely related fungal taxa. After filtering, we reassessed the genomes to compare contiguity, completeness, and contamination levels before and after the process. Furthermore, structural and functional annotation was conducted to evaluate changes in predicted proteins, protein families and domains. Additionally, Average nucleotide identity and phylogenetic analyses were performed to further assess the impact of the filtering. Four genomes showed low-quality statistics and contamination levels between 5 and 12%, mainly of bacteria origin. After removing the contaminated regions, assembly quality metrics improved, and contamination level dropped below 3% in all cases. Functional annotation of the filtered assemblies revealed a reduction in bacteria-associated protein families. Our results demonstrate the presence of contamination in publicly available Onygenaceae fungal genomes and highlight its potential to bias downstream analyses. We emphasize the importance of contamination screening and removal to ensure reliable genomic data for fungal research.

The online version contains supplementary material available at 10.1186/s12864-025-12223-3.

## Linked entities

- **Species:** Onygenaceae (taxon 33184)

## Full-text entities

- **Diseases:** fungal (MESH:D009181)

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12628603/full.md

---
Source: https://tomesphere.com/paper/PMC12628603