# Harnessing deep learning for proteome-scale detection of amyloid signaling motifs

**Authors:** Krzysztof Pysz, Jakub Gałązka, Witold Dyrka

PMC · DOI: 10.1093/bioinformatics/btaf200 · 2025-07-15

## TL;DR

This paper introduces deep learning models to detect amyloid signaling motifs in large protein datasets, improving detection accuracy and scalability.

## Contribution

The study introduces tailored deep learning models for detecting amyloid signaling motifs at proteome scale, outperforming existing methods.

## Key findings

- Bidirectional LSTM and BERT-based models effectively detect amyloid signaling motifs, including novel ones.
- The models perform well on genome-scale datasets and identify motifs from remotely related families.
- The developed models are available as open-source tools for broader use.

## Abstract

Amyloid signaling sequences adopt the cross-β fold that is capable of self-replication in the templating process. Propagation of the amyloid fold from the receptor to the effector protein is used for signal transduction in the immune response pathways in animals, fungi, and bacteria. So far, a dozen of families of amyloid signaling motifs (ASMs) have been classified. Unfortunately, due to the wide variety of ASMs it is difficult to identify them in large protein databases available, which limits the possibility of conducting experimental studies. To date, various deep learning (DL) models have been applied across a range of protein-related tasks, including domain family classification and the prediction of protein structure and protein–protein interactions.

In this study, we develop tailor-made bidirectional LSTM and BERT-based architectures to model ASM, and compare their performance against a state-of-the-art machine learning grammatical model. Our research is focused on developing a discriminative model of generalized ASMs, capable of detecting ASMs in large datasets. The DL-based models are trained on a diverse set of motif families and a global negative set, and used to identify ASMs from remotely related families. We analyze how both models represent the data and demonstrate that the DL-based approaches effectively detect ASMs, including novel motifs, even at the genome scale.

The models are provided as a Python package, asmscan-bilstm, and a Docker image at https://github.com/chrispysz/asmscan-proteinbert-run. The source code can be accessed at https://github.com/jakub-galazka/asmscan-bilstm and https://github.com/chrispysz/asmscan-proteinbert. Data and results are at https://github.com/wdyrka-pwr/ASMscan.

## Full-text entities

- **Diseases:** amyloid (MESH:C000718787)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12261475/full.md

---
Source: https://tomesphere.com/paper/PMC12261475