# SGCP: a spectral self-learning method for clustering genes in co-expression networks

**Authors:** Niloofar Aghaieabiane, Ioannis Koutis

PMC · DOI: 10.1186/s12859-024-05848-w · BMC Bioinformatics · 2024-07-02

## TL;DR

This paper introduces SGCP, a new method for clustering genes in co-expression networks that improves biological relevance using self-learning and gene ontology data.

## Contribution

SGCP introduces a self-learning step using gene ontology to enhance module detection in co-expression networks.

## Key findings

- SGCP produces gene modules with higher GO enrichment compared to existing methods.
- SGCP emphasizes different GO terms than traditional frameworks, indicating distinct biological insights.

## Abstract

A widely used approach for extracting information from gene expression data employs the construction of a gene co-expression network and the subsequent computational detection of gene clusters, called modules. WGCNA and related methods are the de facto standard for module detection. The purpose of this work is to investigate the applicability of more sophisticated algorithms toward the design of an alternative method with enhanced potential for extracting biologically meaningful modules.

We present self-learning gene clustering pipeline (SGCP), a spectral method for detecting modules in gene co-expression networks. SGCP incorporates multiple features that differentiate it from previous work, including a novel step that leverages gene ontology (GO) information in a self-leaning step. Compared with widely used existing frameworks on 12 real gene expression datasets, we show that SGCP yields modules with higher GO enrichment. Moreover, SGCP assigns highest statistical importance to GO terms that are mostly different from those reported by the baselines.

Existing frameworks for discovering clusters of genes in gene co-expression networks are based on relatively simple algorithmic components. SGCP relies on newer algorithmic techniques that enable the computation of highly enriched modules with distinctive characteristics, thus contributing a novel alternative tool for gene co-expression analysis.

## Full-text entities

- **Diseases:** coronary artery disease (MESH:D003324)
- **Species:** Rattus norvegicus (brown rat, species) [taxon 10116], Mus musculus (house mouse, species) [taxon 10090], Drosophila melanogaster (fruit fly, species) [taxon 7227], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** CEMiTool — Homo sapiens (Human), Childhood T acute lymphoblastic leukemia, Cancer cell line (CVCL_J653)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11221046/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11221046/full.md

## References

52 references — full list in the complete paper: https://tomesphere.com/paper/PMC11221046/full.md

---
Source: https://tomesphere.com/paper/PMC11221046