# iDLDDG: predicting protein stability changes from missense mutations in DNA-binding proteins using integrated deep learning features

**Authors:** Xuan Yu, Fang Ge, Dong-Jun Yu, Zhaohong Deng

PMC · DOI: 10.1093/bib/bbag050 · 2026-02-13

## TL;DR

This paper introduces iDLDDG, a deep learning model that accurately predicts how missense mutations affect DNA-binding protein stability, improving disease understanding and therapy development.

## Contribution

iDLDDG is the first framework to rigorously differentiate mutation mechanisms in double- and single-stranded DNA-binding proteins using integrated deep learning features.

## Key findings

- iDLDDG achieves a 10-fold cross-validation PCC of 0.755 on MPD276 and 0.632 on independent test sets.
- The model integrates multi-scale structural and evolutionary information via a multi-channel architecture.
- An entropy-based algorithm identified 181 optimal residues for modeling biophysical constraints.

## Abstract

To understand disease mechanisms and advance therapies, accurately predicting how missense mutations alter protein–DNA binding affinity is critical. Many existing models neglect the unique characteristics of missense mutations in both double-stranded DNA-binding proteins (DSBs) and single-stranded DNA-binding proteins (SSBs). To address this issue, we constructed a comprehensive dataset from diverse sources. By leveraging sequence-based embeddings from pretrained protein language models including ESM2, ProtTrans, and ESM1v, we developed iDLDDG, a deep learning framework that integrates multi-scale structural and evolutionary information via a multi-channel architecture. To balance residue-wise information density against entropy, our entropy-based algorithm determined 181 residues as optimal for modeling biophysical constraints. This approach enhances predictive accuracy and computational efficiency, thereby supporting large-scale assessments of mutation effects in DNA-binding proteins. iDLDDG achieves state-of-the-art performance, with a 10-fold cross-validation PCC of 0.755 on MPD276 and 0.632 on independent test sets encompassing both DSBs and SSBs, significantly surpassing existing methods. By establishing the first computational framework that rigorously differentiates DSB and SSB mutation mechanisms, our work provides a foundation for high-accuracy prediction of pathological mutations in DNA-binding proteins.

## Full-text entities

- **Genes:** PADI1 (peptidyl arginine deiminase 1) [NCBI Gene 29943] {aka HPAD10, PAD1, PDI, PDI1}, CRYGD (crystallin gamma D) [NCBI Gene 1421] {aka CACA, CCA3, CCP, CRYG4, CTRCT4, PCC}, SSB (small RNA binding exonuclease protection factor La) [NCBI Gene 6741] {aka LARP3, La, La/SSB, SSB/La}
- **Diseases:** PDIs (MESH:C563663), DSBs (MESH:C563602)
- **Chemicals:** alanine (MESH:D000409), acid (MESH:D000143), hydrogen (MESH:D006859), amino acid (MESH:D000596), DSB (-)
- **Mutations:** F52A, G from 324, R117A, G133Q, D130K, V170A, R145A, Y329A, R22A, N139A

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12903960/full.md

---
Source: https://tomesphere.com/paper/PMC12903960