MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training
Bo Chen, Zhilei Bei, Xingyi Cheng, Pan Li, Jie Tang, Le Song

TL;DR
MSAGPT introduces a novel MSA generative pretraining method that improves protein structure prediction accuracy, especially in low MSA regimes, by modeling complex evolutionary patterns and leveraging feedback from AlphaFold2.
Contribution
The paper presents MSAGPT, a new approach that uses MSA generative pretraining with evolutionary encoding and feedback mechanisms to enhance protein structure prediction in low MSA scenarios.
Findings
MSAGPT outperforms existing methods in low MSA regimes.
The model effectively captures complex coevolutionary patterns.
Leveraging AlphaFold2 feedback improves prediction accuracy.
Abstract
Multiple Sequence Alignment (MSA) plays a pivotal role in unveiling the evolutionary trajectories of protein families. The accuracy of protein structure predictions is often compromised for protein sequences that lack sufficient homologous information to construct high quality MSA. Although various methods have been proposed to generate virtual MSA under these conditions, they fall short in comprehensively capturing the intricate coevolutionary patterns within MSA or require guidance from external oracle models. Here we introduce MSAGPT, a novel approach to prompt protein structure predictions via MSA generative pretraining in the low MSA regime. MSAGPT employs a simple yet effective 2D evolutionary positional encoding scheme to model complex evolutionary patterns. Endowed by this, its flexible 1D MSA decoding framework facilitates zero or few shot learning. Moreover, we demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Bioinformatics · Genetics, Bioinformatics, and Biomedical Research
