Unsupervised detection and fitness estimation of emerging SARS-CoV-2 variants. Application to wastewater samples (ANRS0160)
Alexandra Lefebvre, Vincent Mar\'echal, Arnaud Gloaguen, Ob\'epine, Consortium, Amaury Lambert, Yvon Maday

TL;DR
This paper introduces an unsupervised statistical method to detect and estimate the fitness of emerging SARS-CoV-2 variants from pooled wastewater data, enabling early identification without prior mutation knowledge.
Contribution
It presents a novel unsupervised clustering approach using mixture models and EM algorithm to analyze mutation frequency trajectories in pooled samples, avoiding lineage classification biases.
Findings
Successfully grouped mutations by variant in wastewater data
Estimated variant fitness consistent with known viral dynamics
Detected the Alpha variant early, comparable to supervised methods
Abstract
Repeated waves of emerging variants during the SARS-CoV-2 pandemics have highlighted the urge of collecting longitudinal genomic data and developing statistical methods based on time series analyses for detecting new threatening lineages and estimating their fitness early in time. Most models study the evolution of the prevalence of particular lineages over time and require a prior classification of sequences into lineages. Such process is prone to induce delays and bias. More recently, few authors studied the evolution of the prevalence of mutations over time with alternative clustering approaches, avoiding specific lineage classification. Most of the aforementioned methods are however either non parametric or unsuited to pooled data characterizing, for instance, wastewater samples. In this context, we propose an alternative unsupervised method for clustering mutations according to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSARS-CoV-2 detection and testing · Biosensors and Analytical Detection · Advanced biosensing and bioanalysis techniques
MethodsNetwork On Network · Sparse Evolutionary Training
