Feature learning of virus genome evolution with the nucleotide skip-gram neural network
Hyunjin Shim

TL;DR
This paper introduces a neural network-based method inspired by NLP techniques to analyze virus genome evolution, revealing mutation patterns and interactions from time-series genomic data.
Contribution
It presents a novel application of the skip-gram neural network to encode allele relationships and detect mutation interactions in viral genomes.
Findings
Identified mutations linked to disinfectant adaptation.
Clustered allele vectors reveal mutation groups.
Model accounts for recombination rates in genome interactions.
Abstract
Recent studies reveal even the smallest genomes such as viruses evolve through complex and stochastic processes, and the assumption of independent alleles is not valid in most applications. Advances in sequencing technologies produce multiple time-point whole-genome data, which enable potential interactions between these alleles to be investigated empirically. To investigate these interactions, we represent alleles as distributed vectors that encode for relationships with other alleles in the course of evolution, and apply artificial neural networks to time-sampled whole-genome datasets for feature learning. We build this platform using methods and algorithms derived from Natural Language Processing (NLP), and we denote it as the nucleotide skip-gram neural network. We learn distributed vectors of alleles using the changes in allele frequency of echovirus 11 in the presence or absence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
