Deep Learning Model for Amyloidogenicity Prediction using a Pre-trained Protein LLM

Zohra Yagoub; Hafida Bouziane

arXiv:2508.12575·cs.LG·August 19, 2025

Deep Learning Model for Amyloidogenicity Prediction using a Pre-trained Protein LLM

Zohra Yagoub, Hafida Bouziane

PDF

TL;DR

This study utilizes a pretrained protein large language model with advanced neural networks to predict amyloidogenic regions, achieving high accuracy and demonstrating the potential of LLMs in bioinformatics.

Contribution

It introduces a novel approach using a pretrained protein LLM with bidirectional LSTM and GRU for amyloidogenicity prediction, showing competitive accuracy.

Findings

01

Achieved 84.5% accuracy in cross-validation

02

Achieved 83% accuracy on test dataset

03

Demonstrated the effectiveness of LLMs in bioinformatics

Abstract

The prediction of amyloidogenicity in peptides and proteins remains a focal point of ongoing bioinformatics. The crucial step in this field is to apply advanced computational methodologies. Many recent approaches to predicting amyloidogenicity within proteins are highly based on evolutionary motifs and the individual properties of amino acids. It is becoming increasingly evident that the sequence information-based features show high predictive performance. Consequently, our study evaluated the contextual features of protein sequences obtained from a pretrained protein large language model leveraging bidirectional LSTM and GRU to predict amyloidogenic regions in peptide and protein sequences. Our method achieved an accuracy of 84.5% on 10-fold cross-validation and an accuracy of 83% in the test dataset. Our results demonstrate competitive performance, highlighting the potential of LLMs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.