Efficient Prediction of Peptide Self-assembly through Sequential and Graphical Encoding
Zihan Liu, Jiaqi Wang, Yun Luo, Shuang Zhao, Wenbin Li, Stan Z. Li

TL;DR
This study systematically evaluates peptide encoding methods using advanced deep learning models on a large molecular dynamics dataset, significantly improving peptide self-assembly prediction accuracy and providing a benchmark for future peptide property predictions.
Contribution
It offers a comprehensive benchmark analysis of peptide encoding techniques with state-of-the-art deep learning models, highlighting Transformer as the most effective for peptide self-assembly prediction.
Findings
Transformer outperforms other models in peptide self-assembly prediction.
Peptide encoding as sequences and graphs significantly impacts prediction accuracy.
Decapeptides are effectively predicted using the proposed models.
Abstract
In recent years, there has been an explosion of research on the application of deep learning to the prediction of various peptide properties, due to the significant development and market potential of peptides. Molecular dynamics has enabled the efficient collection of large peptide datasets, providing reliable training data for deep learning. However, the lack of systematic analysis of the peptide encoding, which is essential for AI-assisted peptide-related tasks, makes it an urgent problem to be solved for the improvement of prediction accuracy. To address this issue, we first collect a high-quality, colossal simulation dataset of peptide self-assembly containing over 62,000 samples generated by coarse-grained molecular dynamics (CGMD). Then, we systematically investigate the effect of peptide encoding of amino acids into sequences and molecular graphs using state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Chemical Synthesis and Analysis · Supramolecular Self-Assembly in Materials
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection · Absolute Position Encodings · Adam · Layer Normalization · Label Smoothing
