# EGPDI: identifying protein–DNA binding sites based on multi-view graph embedding fusion

**Authors:** Mengxin Zheng, Guicong Sun, Xueping Li, Yongxian Fan

PMC · DOI: 10.1093/bib/bbae330 · Briefings in Bioinformatics · 2024-07-08

## TL;DR

This paper introduces EGPDI, a new method for predicting protein-DNA binding sites using advanced graph neural networks and attention mechanisms.

## Contribution

The novel use of multi-view graph embedding fusion for protein-DNA binding site prediction is introduced.

## Key findings

- EGPDI outperforms existing methods in predicting protein-DNA binding sites.
- The fusion of EGNN and GCNII improves the accuracy of global and local node representations.
- The method demonstrates strong generalization ability through cross-validation and case studies.

## Abstract

Mechanisms of protein-DNA interactions are involved in a wide range of biological activities and processes. Accurately identifying binding sites between proteins and DNA is crucial for analyzing genetic material, exploring protein functions, and designing novel drugs. In recent years, several computational methods have been proposed as alternatives to time-consuming and expensive traditional experiments. However, accurately predicting protein-DNA binding sites still remains a challenge. Existing computational methods often rely on handcrafted features and a single-model architecture, leaving room for improvement. We propose a novel computational method, called EGPDI, based on multi-view graph embedding fusion. This approach involves the integration of Equivariant Graph Neural Networks (EGNN) and Graph Convolutional Networks II (GCNII), independently configured to profoundly mine the global and local node embedding representations. An advanced gated multi-head attention mechanism is subsequently employed to capture the attention weights of the dual embedding representations, thereby facilitating the integration of node features. Besides, extra node features from protein language models are introduced to provide more structural information. To our knowledge, this is the first time that multi-view graph embedding fusion has been applied to the task of protein–DNA binding site prediction. The results of five-fold cross-validation and independent testing demonstrate that EGPDI outperforms state-of-the-art methods. Further comparative experiments and case studies also verify the superiority and generalization ability of EGPDI.

## Full-text entities

- **Diseases:** EGCL (MESH:D016369)
- **Chemicals:** acid (MESH:D000143)
- **Cell lines:** S2 — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11229037/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11229037/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/PMC11229037/full.md

---
Source: https://tomesphere.com/paper/PMC11229037