# Optimizing data-driven excellence: Canada’s approach to using pathogen test datasets for quality control, pipeline development and training initiatives

**Authors:** Kara D. Loos, Mark Horsman, Jeff Tuff, Kimia Kamelian, Darian Hole, Chanchal Yadav, Kirsten Palmier, Kristyn Burak, Molly Pratt, Connor Chato, Anna Majer, Shari Tyson, Grace E. Seo, Philip Mabon, Elsie Grudeski, Rhiannon Huzarewich, Russell Mandes, Anneliese Landgraff, Jennifer R. Tanner, Natalie Knox, Morag Graham, Gary Van Domselaar, Jessica Minion, Nathalie Bastien, Timothy Booth, Madison Chapel, Kirsten Biggar, Ana Duggan, Catherine Yoshida, Andrea Tyler

PMC · DOI: 10.1099/mgen.0.001505 · Microbial Genomics · 2026-01-27

## TL;DR

This paper describes Canada's use of standardized test datasets to improve genomic surveillance of SARS-CoV-2 by ensuring reliable and comparable results across different labs and platforms.

## Contribution

The study introduces a framework using curated test datasets and a customized R script for validating genomic analysis workflows in public health.

## Key findings

- Standardized test datasets improve accuracy and consistency in SARS-CoV-2 genomic analysis.
- A customized R script enables comparison of sequencing data from different platforms like Illumina and Nanopore.
- Publicly accessible datasets on Zenodo enhance reproducibility and support training initiatives.

## Abstract

Pathogen genomic surveillance is globally recognized as a pillar of public health. This field has expanded rapidly following the onset of the coronavirus disease 2019 (COVID-19) pandemic, and there is an urgent need to ensure the quality, comparability and reliability of the results of genomic analyses across diverse settings and analytical platforms. Currently, no methodology or framework has been universally adopted to mitigate this issue. This study aimed to provide a solution within the Canadian public health landscape by using standardized test datasets for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomic analysis. In this context, a test dataset refers to a curated set of genomic sequences designed to evaluate the accuracy, consistency and performance of sequencing workflows, bioinformatics pipelines and analytical tools. These datasets serve as benchmarks, allowing laboratories to validate their methodologies and ensure comparability across different platforms. The test datasets included in this analysis were selected based on the use of well-characterized experimental protocols from the application of specimen selection criteria, through to sequence generation. Datasets generated using Illumina and Nanopore sequencing of samples from COVID-19 patients in Saskatchewan, Canada, were used and included clean controls, variable lineages and spiked-in lower-quality run data. Illumina libraries were sequenced using ARTIC network PCR amplification, while Nanopore libraries underwent similar protocols with some modifications. Public test dataset access on Zenodo further facilitates reproducibility, providing data summary outputs and pipeline environment files. A customized R script was developed to compare Illumina data, generating multiple tables and figures highlighting comparisons between analyses. The significance of this study lies in its contribution to the implementation of bioinformatic pipeline validation tools and protocols, which are essential for reliable genomic surveillance and outbreak response. By establishing a structured framework for computational validation, this study enhances the accuracy, comparability and efficiency of genomic surveillance in an evolving landscape of viral strains and testing strategies.

## Linked entities

- **Diseases:** coronavirus disease 2019 (MONDO:0100096), severe acute respiratory syndrome coronavirus 2 (MONDO:0100096)

## Full-text entities

- **Diseases:** infectious disease (MESH:D003141), bacterial pathogen (MESH:D001424), CPHLN (MESH:D007757), infection (MESH:D007239), COVID-19 (MESH:D000086382)
- **Species:** Homo sapiens (human, species) [taxon 9606], Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12847982/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12847982/full.md

## References

11 references — full list in the complete paper: https://tomesphere.com/paper/PMC12847982/full.md

---
Source: https://tomesphere.com/paper/PMC12847982