DivPro: diverse protein sequence design with direct structure recovery guidance
Xinyi Zhou, Guibao Shen, Yingcong Chen, Guangyong Chen, Pheng Ann Heng

TL;DR
DivPro is a new method for designing diverse protein sequences that fold into the same structure, using structure prediction to guide the design process.
Contribution
DivPro introduces a probabilistic sequence space model that improves diversity while maintaining structural accuracy.
Findings
DivPro generates diverse sequences that fold into target structures with high accuracy.
The model outperforms existing methods in sequence diversity while maintaining structural fidelity.
Structure prediction models like AlphaFold2 confirm the reliability of designed sequences.
Abstract
Structure-based protein design is crucial for designing proteins with novel structures and functions, which aims to generate sequences that fold into desired structures. Current deep learning-based methods primarily focus on training and evaluating models using sequence recovery-based metrics. However, this approach overlooks the inherent ambiguity in the relationship between protein sequences and structures. Relying solely on sequence recovery as a training objective limits the models’ ability to produce diverse sequences that maintain similar structures. These limitations become more pronounced when dealing with remote homologous proteins, which share functional and structural similarities despite low-sequence identity. Here, we present DivPro, a model that learns to design diverse sequences that can fold into similar structures. To improve sequence diversity, instead of learning a…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProtein Structure and Dynamics · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms
