Deep Learning Model for Amyloidogenicity Prediction using a Pre-trained Protein LLM
Zohra Yagoub, Hafida Bouziane

TL;DR
This study utilizes a pretrained protein large language model with advanced neural networks to predict amyloidogenic regions, achieving high accuracy and demonstrating the potential of LLMs in bioinformatics.
Contribution
It introduces a novel approach using a pretrained protein LLM with bidirectional LSTM and GRU for amyloidogenicity prediction, showing competitive accuracy.
Findings
Achieved 84.5% accuracy in cross-validation
Achieved 83% accuracy on test dataset
Demonstrated the effectiveness of LLMs in bioinformatics
Abstract
The prediction of amyloidogenicity in peptides and proteins remains a focal point of ongoing bioinformatics. The crucial step in this field is to apply advanced computational methodologies. Many recent approaches to predicting amyloidogenicity within proteins are highly based on evolutionary motifs and the individual properties of amino acids. It is becoming increasingly evident that the sequence information-based features show high predictive performance. Consequently, our study evaluated the contextual features of protein sequences obtained from a pretrained protein large language model leveraging bidirectional LSTM and GRU to predict amyloidogenic regions in peptide and protein sequences. Our method achieved an accuracy of 84.5% on 10-fold cross-validation and an accuracy of 83% in the test dataset. Our results demonstrate competitive performance, highlighting the potential of LLMs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
