# A CSGNN model-based method for essential protein identification

**Authors:** Zixuan Li, Zhiguo Yu, Peng Li

PMC · DOI: 10.3389/fbinf.2026.1731178 · Frontiers in Bioinformatics · 2026-03-16

## TL;DR

This paper introduces a new method called CSGNN to better identify essential proteins by using gene expression data and network correlations.

## Contribution

The novel CSGNN model integrates correlation-guided graph construction with attention-based learning for improved essential protein identification.

## Key findings

- CSGNN outperforms traditional methods in identifying essential proteins on yeast and E. coli datasets.
- The model improves accuracy by leveraging multi-scale subgraph contexts and attention-based representation learning.
- Integration of expression data and network correlations enhances prediction robustness.

## Abstract

Identification of essential proteins is fundamental for understanding cellular processes and disease mechanisms. However, many existing computational methods do not adequately model dynamic expression activity and often underutilize global network context, which limits prediction accuracy. To address these issues, we propose a Correlation-guided Subgraph Graph Neural Network (CSGNN) for essential protein identification by integrating correlation-guided graph construction with attention-based representation learning. First, we derive an activity-aware expression matrix from periodic gene expression patterns, and we construct a weighted protein network by computing Pearson correlation coefficients between gene pairs. This correlation-guided network further defines first-order and second-order neighborhoods, which provide multi-scale subgraph contexts for each protein. Next, we employ a two-layer attention-based graph convolution to learn node embeddings by aggregating information within these correlation-defined neighborhoods. Finally, we form an interaction-aware node representation by integrating each protein embedding with its neighborhood context, and we use a lightweight multilayer perceptron to output an essentiality probability for each protein. Proteins are then ranked by the predicted scores to identify essential candidates. Experiments on yeast and E. coli datasets demonstrate that CSGNN consistently outperforms traditional baselines, indicating improved accuracy and robustness for essential protein identification.

## Full-text entities

- **Species:** Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Escherichia coli (E. coli, species) [taxon 562]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13033630/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13033630/full.md

## References

37 references — full list in the complete paper: https://tomesphere.com/paper/PMC13033630/full.md

---
Source: https://tomesphere.com/paper/PMC13033630