EvoFlows: Evolutionary Edit-Based Flow-Matching for Protein Engineering
Nicolas Deutschmann, Constance Ferragu, Jonathan D. Ziegler, Shayan Aziznejad, Eli Bixby

TL;DR
EvoFlows is a novel protein sequence-to-sequence model that enables controllable insertions, deletions, and substitutions for protein engineering, outperforming existing models in exploring sequence space while maintaining naturalness.
Contribution
EvoFlows introduces a flexible, edit-based modeling approach for protein engineering that supports diverse mutations and outperforms existing methods in exploring protein sequence space.
Findings
EvoFlows generates protein variants consistent with natural families.
It explores farther from templates than baseline models.
EvoFlows supports insertions, deletions, and substitutions.
Abstract
We introduce EvoFlows, a variable-length protein sequence-to-sequence modeling approach designed for protein engineering. Existing protein language models are poorly suited for optimization tasks: autoregressive models require full sequence generation, masked language and discrete diffusion models rely on pre-specified mutation locations, and no existing methods naturally support insertions and deletions relative to a template sequence. EvoFlows learns mutational trajectories between evolutionarily related protein sequences via edit flows, allowing it to perform a controllable number of mutations (insertions, deletions, and substitutions) on a template sequence, predicting not only _which_ mutation to perform, but also _where_ it should occur. Through extensive _in silico_ evaluation on diverse protein families from UniRef and OAS, we show that EvoFlows generates variants that remain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
