Towards A Generative Protein Evolution Machine with DPLM-Evo
Xinyou Wang, Liang Hong, Jiasheng Ye, Zaixiang Zheng, Yu Li, Shujian Huang, Quanquan Gu

TL;DR
DPLM-Evo is a novel discrete diffusion model for proteins that explicitly predicts biological mutations, enabling more realistic evolution simulation, guided generation, and improved mutation effect prediction.
Contribution
It introduces DPLM-Evo, a diffusion framework that models substitutions and indels explicitly, improving biological plausibility and flexibility in protein sequence generation and editing.
Findings
Achieves state-of-the-art mutation effect prediction on ProteinGym.
Enables variable-length simulated evolution and post-editing of proteins.
Improves sequence understanding through explicit mutation modeling.
Abstract
Proteins are shaped by gradual evolution under biophysical and functional constraints. Protein language models learn rich evolutionary constraints from large-scale sequences, and discrete diffusion-based protein language models~(\eg, DPLMs) are promising for both understanding and generation. However, existing DPLMs typically rely on masking-based absorbing diffusion that contradicts a simple biological intuition: proteins evolve through accumulated edits, not by emerging from masks. Consequently, these frameworks lack explicit pretraining objectives for substitution and insertion/deletion (indel) operations, limiting both optimization-style post-editing and flexible guided generation. To address these limitations, we present DPLM-Evo, an evolutionary discrete diffusion framework that explicitly predicts substitution, insertion, and deletion operations during denoising. DPLM-Evo…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
