# Gene print-based cell subtypes annotation of human disease across heterogeneous datasets with gPRINT

**Authors:** Ruojin Yan, Chunmei Fan, Shen Gu, Tingzhang Wang, Zi Yin, Xiao Chen

PMC · DOI: 10.1093/procel/pwaf001 · Protein & Cell · 2025-03-14

## TL;DR

gPRINT is a new algorithm that automatically identifies disease-specific cell subtypes across different datasets, improving accuracy and enabling better understanding of diseases.

## Contribution

gPRINT introduces a novel method for cell subtype annotation using gene expression patterns and neural networks to reduce noise effects.

## Key findings

- gPRINT achieved 98.37% annotation accuracy in external validation.
- The method works across different donors, platforms, and disease subtypes.
- Successfully applied to fibrosis-related diseases and identified key targets and drugs for tendinopathy.

## Abstract

Identification of disease-specific cell subtypes (DSCSs) has profound implications for understanding disease mechanisms, preoperative diagnosis, and precision therapy. However, achieving unified annotation of DSCSs in heterogeneous single-cell datasets remains a challenge. In this study, we developed the gPRINT algorithm (generalized approach for cell subtype identification with single cell’s voicePRINT). Inspired by the principles of speech recognition in noisy environments, gPRINT transforms gene position and gene expression information into voiceprints based on ordered and clustered gene expression phenomena, obtaining unique “gene print” patterns for each cell. Then, we integrated neural networks to mitigate the impact of background noise on cell identity label mapping. We demonstrated the reproducibility of gPRINT across different donors, single-cell sequencing platforms, and disease subtypes, and its utility for automatic cell subtype annotation across datasets. Moreover, gPRINT achieved higher annotation accuracy of 98.37% when externally validated based on the same tissue, surpassing other algorithms. Furthermore, this approach has been applied to fibrosis-associated diseases in multiple tissues throughout the body, as well as to the annotation of fibroblast subtypes in a single tissue, tendon, where fibrosis is prevalent. We successfully achieved automatic prediction of tendinopathy-specific cell subtypes, key targets, and related drugs. In summary, gPRINT provides an automated and unified approach for identifying DSCSs across datasets, facilitating the elucidation of specific cell subtypes under different disease states and providing a powerful tool for exploring therapeutic targets in diseases.

## Linked entities

- **Diseases:** tendinopathy (MONDO:0100010)

## Full-text entities

- **Diseases:** tendinopathy (MESH:D052256), fibrosis (MESH:D005355)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12342163/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12342163/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/PMC12342163/full.md

---
Source: https://tomesphere.com/paper/PMC12342163