Plug & Play Directed Evolution of Proteins with Gradient-based Discrete MCMC
Patrick Emami, Aidan Perreault, Jeffrey Law, David Biagioni, Peter C., St. John

TL;DR
This paper presents a gradient-based MCMC framework for in silico protein evolution that combines multiple models to efficiently discover high-functionality protein variants without retraining.
Contribution
It introduces a novel sampling method that integrates unsupervised and supervised models in a product of experts, enabling efficient exploration of protein sequence space.
Findings
Successfully identified high-likelihood protein variants with multiple mutations
Demonstrated effectiveness across various pre-trained protein language models
Achieved efficient in silico directed evolution without model fine-tuning
Abstract
A long-standing goal of machine-learning-based protein engineering is to accelerate the discovery of novel mutations that improve the function of a known protein. We introduce a sampling framework for evolving proteins in silico that supports mixing and matching a variety of unsupervised models, such as protein language models, and supervised models that predict protein function from sequence. By composing these models, we aim to improve our ability to evaluate unseen mutations and constrain search to regions of sequence space likely to contain functional proteins. Our framework achieves this without any model fine-tuning or re-training by constructing a product of experts distribution directly in discrete protein space. Instead of resorting to brute force search or random sampling, which is typical of classic directed evolution, we introduce a fast MCMC sampler that uses gradients to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Genomics and Phylogenetic Studies · Software Engineering Research
