Challenges in Applying DNA-Binding Protein Predictors to Biological Research
Graydon Cowgill, Steven Anthony Strazza, Savannah Wilson, Ranjeeta Odari, Sadia Afrin Bristy, Yongjian Qiu, Sayaka Miura

TL;DR
This paper evaluates DNA-binding protein prediction tools and finds they are unreliable for real-world biological research due to technical and accuracy issues.
Contribution
The study provides a critical evaluation of existing DNA-binding protein prediction tools using real-world case studies.
Findings
Most DNA-binding prediction tools are web-based but suffer from poor maintenance and reliability issues.
Prediction scores often fail to reflect incorrect outputs, leading to consistent errors across multiple methods.
Even minor misclassifications can significantly impact biological interpretations.
Abstract
DNA binding proteins play a crucial role in regulating gene expression, DNA replication, and chromatin organization. While many DNA-binding proteins have been identified, many unique DNA-binding proteins in non-model organisms and recently evolved lineage- or species-specific proteins remain uncharacterized or often lack experimental validation. In addition, genetic variants may alter previously known DNA-binding proteins, leading to loss of binding ability. To address this gap, various computational tools have been developed to predict DNA-binding proteins from protein sequences or structures. Yet, their real-world utility in biological research remains uncertain. To evaluate their effectiveness, we assessed the availability and predictive performance of existing tools using five real-world case studies. We found that most tools were web-based, offering accessibility to researchers…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genomics and Phylogenetic Studies · RNA and protein synthesis mechanisms
