Unsupervisedly Prompting AlphaFold2 for Few-Shot Learning of Accurate Folding Landscape and Protein Structure Prediction
Jun Zhang, Sirui Liu, Mengyun Chen, Haotian Chu, Min Wang, Zidong, Wang, Jialiang Yu, Ningxi Ni, Fan Yu, Diqing Chen, Yi Isaac Yang, Boxin Xue,, Lijiang Yang, Yuan Liu, Yi Qin Gao

TL;DR
This paper introduces EvoGen, a meta generative model that enhances AlphaFold2's ability to predict protein structures accurately with limited sequence data, enabling better performance on orphan sequences and exploring conformational diversity.
Contribution
EvoGen improves AlphaFold2's performance on low-data protein sequences by using virtual homologue sequences, enabling few-shot learning and probabilistic structure generation.
Findings
EvoGen significantly improves folding accuracy for low-homology targets.
The combined method enables exploration of alternative protein conformations.
EvoGen enhances AlphaFold2's applicability to orphan sequences.
Abstract
Data-driven predictive methods which can efficiently and accurately transform protein sequences into biologically active structures are highly valuable for scientific research and medical development. Determining accurate folding landscape using co-evolutionary information is fundamental to the success of modern protein structure prediction methods. As the state of the art, AlphaFold2 has dramatically raised the accuracy without performing explicit co-evolutionary analysis. Nevertheless, its performance still shows strong dependence on available sequence homologs. Based on the interrogation on the cause of such dependence, we presented EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets. By prompting the model with calibrated or virtually generated homologue sequences, EvoGen helps AlphaFold2 fold accurately in low-data regime and even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Protein Structure and Dynamics · RNA and protein synthesis mechanisms
