Information Clustering and Pathogen Evolution
Baptiste Filoche, Stefan Hohenegger

TL;DR
This paper introduces a novel clustering method based on Fisher information invariance to analyze pathogen evolution, enabling better tracking of variants and prediction of epidemic dynamics from genetic sequence data.
Contribution
It proposes a new information-theoretic clustering approach that captures pathogen interactions and evolution, validated with SARS-CoV-2 spike protein data.
Findings
Clusters variants with similar epidemiological interactions
Identifies mutations leading to dominant variants
Predicts the growth of dangerous variants
Abstract
Recent outbreaks of infectious diseases have been monitored closely from an epidemiological and microbiological perspective. Extracting from this wealth of data the information that is relevant for the evolution of the pathogen and predict the further dynamic of the epidemic is a difficult task. We therefore consider clusterings of these data to condense this information. We interpret the relative abundance of (genetic) variants of the pathogen as a time-dependent probability distribution and consider clusterings that keep the Fisher information (approximately) invariant, in order to ensure that they capture the dynamics of the pandemic. By first studying analytic models, we show that this condition groups variants together that interact in a similar fashion with the population and show comparable adaptation to the epidemiological situation. Moreover, we demonstrate that the same…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolution and Genetic Dynamics · Fractal and DNA sequence analysis · Genomics and Phylogenetic Studies
