# An inter-laboratory study characterizes the impact of bioinformatic approaches on genome-based cluster detection for foodborne bacterial pathogens

**Authors:** Leonie F. Forth, Burkhard Malorny, Markus Bönn, Erik Brinks, Grégoire Denay, Carlus Deneke, Hosny El-Adawy, Jennie Fischer, Jannika Fuchs, Ekkehard Hiller, Nancy Bretschneider, Sylvia Kleta, Stefanie Lüth, Tilman Schultze, Henning Petersen, Michaela Projahn, Christian Schäfers, Kerstin Stingl, Andreas J. Stroehlein, Laura Uelze, Kathrin Szabo, Anne Wöhlke, Jörg Linde

PMC · DOI: 10.3389/fmicb.2025.1629731 · Frontiers in Microbiology · 2025-11-03

## TL;DR

This study shows how different bioinformatics tools and quality interpretations affect the detection of clusters in foodborne bacterial pathogen genomes.

## Contribution

The study quantifies inter-laboratory variability in genome-based cluster detection and identifies key factors influencing reproducibility.

## Key findings

- Intra-species contamination was the main factor affecting cluster composition decisions.
- cgMLST cluster variability was most influenced by sample inclusion/exclusion decisions.
- SNP calling with Snippy was mostly consistent, but C. jejuni results varied with recombination filtering.

## Abstract

Accurate assignment of whole-genome sequences to clusters in foodborne outbreak investigations remains challenging. Variability in bioinformatics tools and quality metrics significantly impacts clustering outcomes. This study assessed inter-laboratory variance in cluster identification by providing four datasets of 50 raw Illumina paired-end sequences covering Shiga toxin-producing Escherichia coli, Listeria monocytogenes, Salmonella enterica, and Campylobacter jejuni. Following general rules of a specified guideline, participants applied in-house protocols for read quality assessment, 7-gene MLST, cgMLST, and SNP calling, then assigned samples to predefined focus clusters based on allele distance (AD) and mutations. Results revealed that differences in the interpretation of raw sequence and genome assembly quality influenced sample inclusion and finally cluster composition. Here, intra-species contamination was the most significant factor driving variability in decisions on whether to include or exclude samples. With one exception, 7-gene Multilocus-Sequence Typing (MLST) yielded consistent sequence types using different bioinformatics tools. The largest influence on cgMLST-defined clusters was the inclusion or exclusion of samples. Regarding bioinformatics, cgMLST was mainly reproducible. For S. enterica, discrepancies due to different software (Ridom SeqSphere+ vs. ChewieSnake) were larger than discrepancies due to different schemas. For other species, different schemas introduced larger discrepancies than different software. Most notably, C. jejuni cluster assignment was strongly affected by cgMLST schemas differing by a factor of two in the number of loci. SNP calling using Snippy produced concordant results across participants, except for C. jejuni when recombination filtering was used. This study highlights the impact caused by different interpretations of quality values when assessing clusters. Low-resolution cgMLST schemas were unsuitable for Campylobacter jejuni, and clustering near cut-off values was sensitive to bioinformatics tool selection. Standardized protocols are essential for reliable inter-laboratory comparison in foodborne pathogen surveillance.

## Linked entities

- **Species:** Listeria monocytogenes (taxon 1639), Salmonella enterica (taxon 28901), Campylobacter jejuni (taxon 197)

## Full-text entities

- **Species:** Listeria monocytogenes (species) [taxon 1639], Campylobacter jejuni (species) [taxon 197], Salmonella enterica (species) [taxon 28901], Escherichia coli (E. coli, species) [taxon 562]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12621568/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12621568/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/PMC12621568/full.md

---
Source: https://tomesphere.com/paper/PMC12621568