# Low-level contamination confounds population genomic analysis

**Authors:** Audrey K Ward, Eduardo F C Scopel, Brent Shuman, Michelle Momany, Douda Bensasson

PMC · DOI: 10.1093/g3journal/jkag021 · G3: Genes | Genomes | Genetics · 2026-01-30

## TL;DR

This paper shows that even small amounts of contamination in genome data can mislead population genomic analyses and suggests using B-allele frequency plots to detect it.

## Contribution

The study introduces a method to detect intraspecies contamination using B-allele frequency plots and demonstrates its impact on phylogenetic analysis.

## Key findings

- Only eight out of 1,298 yeast genomes showed at least 5% contamination.
- Contamination rates varied significantly between sequencing centers.
- As little as 5–10% contamination can alter phylogenetic tree topologies.

## Abstract

Genome sequence contamination has a variety of causes and can originate from within or between species. Previous research focused on contamination between distantly related species or on prokaryotes. Here, we test for intraspecies contamination by mapping short read genome data to a reference and visualizing the frequency of reads with single nucleotide differences from the reference. Out of 1,298 publicly available genome sequences investigated for Saccharomyces cerevisiae, a small number (eight genomes) show at least 5% contamination. Contamination rates differed however among sequencing centers: one unusually large study had a low contamination rate (below 0.2%) but the contamination rate was higher for other studies (2% or 15% of genomes). Using genome data contaminated in silico to known degrees, we showed that contamination is recognizable in plots with unexpected secondary allele (B-allele) frequencies of at least 5% and measured contamination effects on admixture and phylogenetic analysis in two fungal species. With a standard base calling pipeline, we found that contaminated genomes superficially appeared to produce good quality genome data. Yet as little as 5–10% genome contamination was enough to change phylogenetic tree topologies and make contaminated strains appear as hybrids between lineages (genetically admixed). We recommend the use of B-allele frequency plots to screen genome resequencing data for intraspecies contamination.

## Linked entities

- **Species:** Saccharomyces cerevisiae (taxon 4932)

## Full-text entities

- **Species:** Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13042315/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13042315/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/PMC13042315/full.md

---
Source: https://tomesphere.com/paper/PMC13042315