# scGGC: a two-stage strategy for single-cell clustering through cellular gene pathway construction

**Authors:** Zhi Zhang, Qiucheng Sun, Chunyan Wang, Songrun Jiang

PMC · DOI: 10.1093/bib/bbaf368 · 2025-07-23

## TL;DR

This paper introduces scGGC, a new method for clustering single-cell RNA sequencing data that improves accuracy and biological relevance by integrating graph autoencoders and adversarial networks.

## Contribution

The novel two-stage strategy combines cell–cell and cell-gene interactions with adversarial training to enhance clustering accuracy and marker gene identification.

## Key findings

- scGGC outperforms eight existing methods on nine scRNA-seq datasets, with up to 10.1% improvement in Adjusted Rand Index.
- Marker gene overlap rates exceed 70% across multiple datasets, confirming the biological relevance of the clustering results.

## Abstract

In the last few years, there has been great advancement in the field of single-cell data investigation, particularly in the development of clustering methods. The advanced research is increased for the development of clustering algorithms tailored for single-cell RNA sequencing data. Conventional methods primarily focus on local relationships among cells or genes, while overlooking the global cell-gene interactions. As a result, the high dimensionality, noise, and sparsity of the data continue to pose significant challenges to clustering accuracy. To address the challenges of single-cell clustering analysis, we propose a novel single-cell clustering model, scGGC, which integrates graph autoencoders and generative adversarial network techniques. The innovations of scGGC include two components: (i) construction of an adjacency matrix that incorporates cell–cell and cell-gene relationships to capture complex interactions in a graph structure, enabling nonlinear dimensionality reduction and initial clustering via a graph autoencoder; (ii) enhancement of clustering performance by selecting high-confidence samples from the initial clusters for adversarial neural network training. A comprehensive evaluation on nine publicly available scRNA-seq datasets demonstrates that scGGC outperforms eight comparison methods. For example, on datasets such as MHC3K, the Adjusted Rand Index increases by an average of 10.1%. Furthermore, marker gene identification and cell type annotation further confirm the biological relevance of scGGC, with marker gene overlap rates exceeding 70% across multiple datasets. We conclude that scGGC not only improves the accuracy of single-cell data clustering but also enhances the identification of cell-type-specific marker genes. The scGGC code is available at https://github.com/Zhi1002/scGGC.

## Full-text entities

- **Genes:** Cldn5 (claudin 5) [NCBI Gene 12741] {aka MBEC1, Tmvcf}, Upk3b (uroplakin 3B) [NCBI Gene 100647] {aka P35, PMS2L14, UpIIIb}
- **Diseases:** NMI (MESH:C537354), lung cancer (MESH:D008175), cancer (MESH:D009369), FMI (MESH:C566784)
- **Chemicals:** GAN (-)
- **Species:** Mus musculus (house mouse, species) [taxon 10090]

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12284768/full.md

---
Source: https://tomesphere.com/paper/PMC12284768