# Spatiotemporal structure of SARS-CoV-2 mutational frequencies in wastewater samples from Ontario

**Authors:** Paula Magbor, William Z. Wang, Gopi Gugan, Abayomi S. Olabode, Devan G. Becker, Valeria R. Parreira, Opeyemi U. Lawal, Amber Fedynak, Linkang Zhang, Fozia Rizvi, Melinda Precious, Christopher T. DeGroot, Lawrence Goodridge, Art F. Y. Poon, Pablo Colunga-Salas, Pablo Colunga-Salas, Pablo Colunga-Salas

PMC · DOI: 10.1371/journal.pone.0333945 · PLOS One · 2025-10-16

## TL;DR

This study uses wastewater data to track how SARS-CoV-2 mutations spread over time and space in Ontario.

## Contribution

A novel method using mutation frequency vectors to analyze spatiotemporal patterns in fragmented wastewater sequencing data.

## Key findings

- Temporal structure in mutation frequencies was largely driven by the spread of variants of concern.
- Genetic similarity between samples was negatively correlated with geographic distance.
- Spatial differentiation in SARS-CoV-2 genomic variation was measurable at the provincial scale.

## Abstract

Starting October 2021, the Ontario wastewater surveillance initiative has used next-generation sequencing (NGS) to monitor SARS-CoV-2 RNA in wastewater samples. The fragmented and heterogeneous nature of these data precludes using comparative methods that require full-length genome sequences. In this study, we investigate the utility of the inner product of the vectors of mutation frequencies to quantify the temporal and spatial structure of these data. Raw sequence data were trimmed and mapped to the SARS-CoV-2 reference genome to extract mutation frequencies and coverage statistics. These data were filtered for samples with incomplete metadata, positions with insufficient coverage (> 100 reads), or mutations with frequencies below 1%. For every pair of samples, we calculated the inner product of the respective mutation frequency vectors, and normalized the result to obtain a cosine distance. In total, we processed 1,619 samples from October 2021 to June 2023. The average depth was 7,693 reads, with mean coverage of 24,853 nt. A total of 241,078 mutations were detected in these samples. We restricted our analysis to 20 consecutive months with samples from at least one health region per month. A projection of the resulting cosine distance matrix revealed substantial temporal structure largely driven by the rapid spread of variants of concern. Genetic similarity, as quantified by the normalized dot product of mutation frequencies, was significantly negatively correlated with the geographic distance between sampling locations. These results suggest that spatial differentiation in the genomic variation of SARS-CoV-2 among wastewater samples can be measured, even at the relatively small scale of a single province.

## Linked entities

- **Diseases:** SARS-CoV-2 (MONDO:0100096)

## Full-text entities

- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12530563/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12530563/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/PMC12530563/full.md

---
Source: https://tomesphere.com/paper/PMC12530563