# IDRdecoder: a machine learning approach for rational drug discovery toward intrinsically disordered regions

**Authors:** Clara Shionyu-Mitusyama, Satoshi Ohmori, Subaru Hirata, Hirokazu Ishida, Tsuyoshi Shirai

PMC · DOI: 10.3389/fbinf.2025.1627836 · Frontiers in Bioinformatics · 2025-07-18

## TL;DR

IDRdecoder is a machine learning tool that helps identify drug targets in disordered protein regions, which are often ignored but important in disease.

## Contribution

IDRdecoder is the first method to predict drug interaction sites and ligands in intrinsically disordered regions using transfer learning.

## Key findings

- IDRdecoder achieved AUC scores of 0.616 for drug interaction sites and 0.702 for ligand types.
- Tyr and Ala are preferred target sites in IDRs, and flexible ligand substructures like alkyl groups are favored.
- The model outperformed existing methods like ProteinBERT in predicting IDR-related drug interactions.

## Abstract

Intrinsically disordered regions (IDRs) of proteins have traditionally been overlooked as drug targets. However, with growing recognition of their crucial role in biological activity and their involvement in various diseases, IDRs have emerged as promising targets for drug discovery. Despite this potential, rational methodologies for IDR-targeted drug discovery remain underdeveloped, primarily due to a lack of reference experimental data.

This study explores a machine learning approach to predict IDR functions, drug interaction sites, and interacting molecular substructures within IDR sequences. To address the data gap, stepwise transfer learning was employed. IDRdecoder sequentially generate predictions for IDR classification, interaction sites, and interacting ligand substructures. In the first step, the neural net was trained as autoencoder by using 26,480,862 predicted IDR sequences. Then it was trained against 57,692 ligand-binding PDB sequences with higher IDR tendency via transfer learning for predict ligand interacting sites and ligand types.

IDRdecoder was evaluated against 9 IDR sequences, which were experimentally detailed as drug targets. In the encoding space, specific GO terms related to the hypothesized functions of the evaluation IDR sequences were highly enriched. The model’s prediction performance for drug interacting sites and ligand types demonstrated the area under the curve (AUC) of 0.616 and 0.702, respectively. The performance was compared with existing methods including ProteinBERT, and IDRdecoder demonstrated moderately improved performance.

IDRdecoder is the first application for predicting drug interaction sites and ligands in IDR sequences. Analysis of the prediction results revealed characteristics beneficial for IDR-drug design; for instance, Tyr and Ala are preferred target sites, while flexible substructures, such as alkyl groups, are favored in ligand molecules.

## Full-text entities

- **Chemicals:** Tyr (MESH:D014443), Ala (MESH:D000409)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12313641/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12313641/full.md

## References

67 references — full list in the complete paper: https://tomesphere.com/paper/PMC12313641/full.md

---
Source: https://tomesphere.com/paper/PMC12313641