# Predicting Protein–Protein Interactions by Convolutional Neural Network Model

**Authors:** Shuaibo Shi, Ting Xiong, Dong Wang, Lingling Wei, Lin Li, Zhixin Li, Yanfen Lyu

PMC · DOI: 10.3390/biotech15010020 · BioTech · 2026-02-16

## TL;DR

This paper introduces a new method using convolutional neural networks to accurately predict protein-protein interactions based on sequence data.

## Contribution

A novel CNN-based approach combining protein and gene sequence features for predicting PPIs with high accuracy.

## Key findings

- The method achieved 99.28% accuracy on the Saccharomyces cerevisiae dataset.
- It outperformed existing methods on four species datasets with accuracies up to 98.62%.
- The approach was successfully extended to predict interaction and non-interaction networks.

## Abstract

The study of protein–protein interactions (PPIs) is of significant importance for elucidating biological processes, clarifying pathological mechanisms, and promoting drug development. In this study, we proposed a method to predict PPIs based on protein sequence and gene sequence information, combined with convolutional neural networks (CNNs). First, we extracted three types of features from protein sequence: global physicochemical properties features of the protein sequence, local same type of amino acid position variation features, and protein evolutionary conservation features; simultaneously, we extracted single nucleotide frequency and positional features, dinucleotide frequency features, and trinucleotide frequency features from the corresponding gene sequence. During the feature extraction process, we employed the amphiphilic pseudo amino acid composition (APAAC) method to extract the global hydrophobicity and hydrophilicity features of the protein sequence; we defined a new mathematical descriptor—θ interval deviation product factor—to extract protein evolutionary conservation features from Position Specific Scoring Matrix (PSSM); we also defined a mapping function to map all nucleotides in the gene sequence onto a unit circle, and then extracted nucleotide positional features from the mapped points. Second, based on extracted features, we constructed a 36 × 32 sample feature grayscale map to represent a protein pair sample. Finally, we developed a CNN model to predict PPIs. Our method achieved superior results on four species test sets: an accuracy of 99.28% on the Saccharomyces cerevisiae dataset, 98.15% on the Drosophila melanogaster dataset, 98.62% on the Homo sapiens dataset, and 96.84% on the Mus musculus dataset, outperforming existing computational methods. Furthermore, we extended the application of this method to the prediction of protein–protein interaction networks and non-interaction networks, and also achieved promising results.

## Linked entities

- **Species:** Saccharomyces cerevisiae (taxon 4932), Drosophila melanogaster (taxon 7227), Homo sapiens (taxon 9606), Mus musculus (taxon 10090)

## Full-text entities

- **Genes:** MAPT (microtubule associated protein tau) [NCBI Gene 4137] {aka DDPAC, FTD1, FTDP-17, MAPTL, MSTD, MTBT1}
- **Diseases:** neurological disorders (MESH:D009461), cancer (MESH:D009369), injury to (MESH:D014947), bipolar disorder (MESH:D001714), immune dysfunction (MESH:D007154), DIP (MESH:C563663), cardiovascular diseases (MESH:D002318)
- **Chemicals:** amino acid (MESH:D000596), PPIs (-)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Drosophila melanogaster (fruit fly, species) [taxon 7227], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12938199/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12938199/full.md

## References

59 references — full list in the complete paper: https://tomesphere.com/paper/PMC12938199/full.md

---
Source: https://tomesphere.com/paper/PMC12938199