# Relational subgraphs fused with complete subgraphs based on the knowledge graph for mining protein complexes

**Authors:** Ruixue Zhao, Dandan Zhang, Yuantao Kou, Guojian Xian, Xiao Yang

PMC · DOI: 10.1038/s41598-025-18281-7 · Scientific Reports · 2025-10-29

## TL;DR

This paper introduces a new method for finding protein complexes by combining knowledge graphs and subgraph analysis in Arabidopsis thaliana.

## Contribution

A novel approach integrating relational and complete subgraphs in a knowledge graph to predict and identify protein complexes.

## Key findings

- A knowledge graph with 68,713 nodes and 109,496 relationships was constructed for Arabidopsis thaliana.
- 1,232 protein-protein interactions were predicted, with 682 confirmed by existing databases.
- 336 protein complexes were identified using complete subgraphs from predicted interactions.

## Abstract

The potential discovery of protein complexes can elucidate the structure of protein-protein interaction networks and identify downstream regulatory genes. Given the complexity of protein-protein interactions, interpretable domain knowledge discovery has gained significant attention. In this study, we constructed a knowledge graph for interacting proteins by gathering data from UniProt and PlaPPISite databases related to the model plant Arabidopsis thaliana. We developed a relational subgraph-driven protein-protein interaction prediction model based on this knowledge graph to predict interactions within connected subgraphs. Subsequently, complete subgraphs of interacting proteins were extracted, enabling the potential discovery of protein complex structures. The knowledge graph consisted of 68,713 nodes and 109,496 semantic relationships. A total of 1,232 protein-protein interactions were predicted. Comparison with experimentally validated interactions recorded in the STRING and BioGrid databases revealed that 682 of these interactions were confirmed. Based on the predicted interactions, 336 protein complexes were identified by mining the complete subgraphs. The proposed knowledge mining method, which integrates relational subgraphs and complete subgraphs, facilitates the discovery of protein complexes and provides a novel approach for analyzing their structures and identifying downstream genes.

The online version contains supplementary material available at 10.1038/s41598-025-18281-7.

## Linked entities

- **Species:** Arabidopsis thaliana (taxon 3702)

## Full-text entities

- **Species:** Arabidopsis thaliana (mouse-ear cress, species) [taxon 3702]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12572110/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12572110/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC12572110/full.md

---
Source: https://tomesphere.com/paper/PMC12572110