Mutational paths with sequence-based models of proteins: from sampling to mean-field characterisation
Eugenio Mauri (LPENS), Simona Cocco (LPENS), R\'emi Monasson (LPENS)

TL;DR
This paper introduces a framework for analyzing mutational paths in proteins using sequence-based models, combining sampling algorithms and mean-field theory to understand evolutionary dynamics and extend distance estimates.
Contribution
It presents a novel algorithm for sampling mutational paths and applies mean-field theory to characterize their properties in sequence-based protein models.
Findings
Validated sampling algorithm on solvable protein models
Applied models to natural protein data using Restricted Boltzmann Machines
Extended Kimura's evolutionary distance estimate to epistatic models
Abstract
Identifying and characterizing mutational paths is an important issue in evolutionary biology and in bioengineering. We here introduce a generic description of mutational paths in terms of the goodness of sequences and of the mutational dynamics (how sequences change) along the path. We first propose an algorithm to sample mutational paths, which we benchmark on exactly solvable models of proteins in silico, and apply to data-driven models of natural proteins learned from sequence data with Restricted Boltzmann Machines. We then use mean-field theory to characterize the properties of mutational paths for different mutational dynamics of interest, and show how it can be used to extend Kimura's estimate of evolutionary distances to sequence-based epistatic models of selection.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolution and Genetic Dynamics · RNA and protein synthesis mechanisms · Genomics and Phylogenetic Studies
