# Challenges in Applying DNA-Binding Protein Predictors to Biological Research

**Authors:** Graydon Cowgill, Steven Anthony Strazza, Savannah Wilson, Ranjeeta Odari, Sadia Afrin Bristy, Yongjian Qiu, Sayaka Miura

PMC · DOI: 10.3390/ijms26199785 · 2025-10-08

## TL;DR

This paper evaluates DNA-binding protein prediction tools and finds they are unreliable for real-world biological research due to technical and accuracy issues.

## Contribution

The study provides a critical evaluation of existing DNA-binding protein prediction tools using real-world case studies.

## Key findings

- Most DNA-binding prediction tools are web-based but suffer from poor maintenance and reliability issues.
- Prediction scores often fail to reflect incorrect outputs, leading to consistent errors across multiple methods.
- Even minor misclassifications can significantly impact biological interpretations.

## Abstract

DNA binding proteins play a crucial role in regulating gene expression, DNA replication, and chromatin organization. While many DNA-binding proteins have been identified, many unique DNA-binding proteins in non-model organisms and recently evolved lineage- or species-specific proteins remain uncharacterized or often lack experimental validation. In addition, genetic variants may alter previously known DNA-binding proteins, leading to loss of binding ability. To address this gap, various computational tools have been developed to predict DNA-binding proteins from protein sequences or structures. Yet, their real-world utility in biological research remains uncertain. To evaluate their effectiveness, we assessed the availability and predictive performance of existing tools using five real-world case studies. We found that most tools were web-based, offering accessibility to researchers without computational expertise. However, many suffered from poor maintenance, including frequent server connection problems, input errors, and long processing times. Among the ten tools that were functional and practical, we found that prediction scores often failed to reflect incorrect outputs, and multiple methods frequently produced the same erroneous predictions. Overall, even a small number of misclassifications can significantly distort biological interpretation, indicating that current DNA-binding prediction tools are not yet sufficiently reliable for empirical research.

## Full-text entities

- **Genes:** TP53 (tumor protein p53) [NCBI Gene 7157] {aka BCC7, BMFS5, LFS1, P53, TRP53}, MB (myoglobin) [NCBI Gene 4151] {aka MYOSB, PVALB}, FTO (FTO alpha-ketoglutarate dependent dioxygenase) [NCBI Gene 79068] {aka ALKBH9, BMIQ14, GDFD, IFEX9}, EZH2 (enhancer of zeste 2 polycomb repressive complex 2 subunit) [NCBI Gene 2146] {aka ENX-1, ENX1, EZH2b, KMT6, KMT6A, WVS}, TARDBP (TAR DNA binding protein) [NCBI Gene 23435] {aka ALS10, TDP-43}, ZNF436 (zinc finger protein 436) [NCBI Gene 80818] {aka ZNF, Zfp46}, DBP (D-box binding PAR bZIP transcription factor) [NCBI Gene 1628] {aka DABP, taxREB302}, FOXP2 (forkhead box P2) [NCBI Gene 93986] {aka CAGH44, SPCH1, TNRC10}
- **Diseases:** speech and language disorders (MESH:D001072), tumor suppressor (OMIM:601308), Cancer (MESH:D009369), injury to (MESH:D014947)
- **Chemicals:** dipeptide (MESH:D004151), DP (MESH:D004176), AA (MESH:D000596), lactose (MESH:D007785), NA (MESH:D012964), acid (MESH:D000143), SA (MESH:D000077145), DP-Bind (-)
- **Species:** Homo sapiens (human, species) [taxon 9606], Escherichia coli (E. coli, species) [taxon 562], Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702]
- **Mutations:** R175H, R248W, R273H, R553H

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12524727/full.md

---
Source: https://tomesphere.com/paper/PMC12524727