# PepLM-GNN: A graph neural network framework leveraging pre-trained language models for peptide-protein binding prediction

**Authors:** Ke Yan, Meijing Li, Shutao Chen, Tianyi Liu, Jing Hao, Bin Liu, Zhen Li

PMC · DOI: 10.1371/journal.pcbi.1014084 · PLOS Computational Biology · 2026-03-24

## TL;DR

PepLM-GNN is a new framework that combines language models and graph networks to accurately predict how peptides bind to proteins, which is important for drug development.

## Contribution

PepLM-GNN introduces a novel hybrid graph network that integrates pre-trained language models with graph convolutional and isomorphism networks for improved peptide-protein interaction prediction.

## Key findings

- PepLM-GNN outperforms existing methods in predicting peptide-protein interactions with higher accuracy and robustness.
- The framework effectively handles non-Euclidean data and cold-start scenarios using semantic features and graph-based modeling.
- PepLM-GNN is successfully applied to virtual peptide drug screening, aiding in drug discovery and protein function elucidation.

## Abstract

The precise prediction of peptide-protein interaction (PepPI) is a core support for promoting breakthroughs in peptide drug research, as well as understanding the regulatory mechanisms of biomolecules. Researchers have developed several computational methods to predict PepPI. However, existing computational methods also have significant limitations. At the level of data feature characterisation, the problem of PepPI does not conform to the Euclidean axioms, making it difficult for conventional prediction methods to effectively measure the underlying correlations between peptides and proteins. At the level of model generalisation performance, existing approaches are often hampered by insufficient generalisation ability, as manifested by their markedly degraded performance in cold start scenarios involving novel peptides, novel proteins, and novel binding pairs.

In this study, we propose a computing framework, PepLM-GNN, that integrates a pre-trained language ProtT5 model with a hybrid graph network for accurate identification of PepPI. This model constructs a graph by using ProtT5-extracted semantic context features of peptides and proteins to form heterogeneous nodes, with edges connecting interacting peptide-protein pairs. The hybrid graph network Graph Convolutional Networks (GCN) provides the comprehensive information of the peptide and protein sequences, while employing the Graph Isomorphism Network (GIN) to capture the global interactions between them. Specifically, the GCN aggregates both the semantic context information of node sequences and local neighbourhood information, effectively representing non-Euclidean data. To capture the global associations, we adopt a GIN strategy to optimize the cross-node feature interaction and transfer process, thereby enhancing the generalisation performance of addressing the cold start scenario. Compared with the existing advanced methods, PepLM-GNN demonstrated highly accurate performance and robustness in predicting the PepPI. We further demonstrated the capabilities of PepLM-GNN in virtual peptide drug screening, which is expected to facilitate the discovery of peptide drugs and the elucidation of protein functions.

We propose a computational framework, PepLM-GNN, that integrates the ProtT5 pre-trained language model with a hybrid graph network. Specifically, the semantic features of peptides and proteins are extracted using ProtT5 to construct a graph. Within the hybrid graph network, GCN model aggregates semantic and local neighborhood information from node sequences, enabling an adequate representation of non-Euclidean data. Meanwhile, GIN model is utilized to optimize the process of cross-node feature interaction and transmission, thereby enhancing the generalization performance in addressing cold-start scenarios. Experimental results demonstrate that PepLM-GNN outperforms existing state-of-the-art methods in both accuracy and robustness for PepPI prediction. Moreover, PepLM-GNN can be applied to virtual peptide drug screening, thereby accelerating the development of peptide drugs. Furthermore, we have established a public online service platform (http://bliulab.net/PepLM-GNN) to facilitate the practical application.

## Full-text entities

- **Genes:** RBBP4 (RB binding protein 4, chromatin remodeling factor) [NCBI Gene 5928] {aka NURF55, RBAP48, lin-53}, TTC41P (tetratricopeptide repeat domain 41, pseudogene) [NCBI Gene 253724] {aka GNN, GNNP}, CSRP3 (cysteine and glycine rich protein 3) [NCBI Gene 8048] {aka CLP, CMD1M, CMH12, CRP3, MLP}, INS (insulin) [NCBI Gene 3630] {aka IDDM, IDDM1, IDDM2, ILPR, IRDN, MODY10}, SALL4 (spalt like transcription factor 4) [NCBI Gene 57167] {aka DRRS, HSAL4, IVIC, ZNF797}
- **Diseases:** cancer (MESH:D009369), tumorigenesis (MESH:D063646), neurodegenerative diseases (MESH:D019636), HCC (MESH:D006528)
- **Chemicals:** carbon (MESH:D002244), alanine (MESH:D000409), FFW (-), glucose (MESH:D005947)
- **Species:** Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13012464/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13012464/full.md

## References

77 references — full list in the complete paper: https://tomesphere.com/paper/PMC13012464/full.md

---
Source: https://tomesphere.com/paper/PMC13012464